Friday, October 31, 2008

Beyond Auditing

Sometimes capturing the fact that someone changed something and who that someone is, isn't enough. Sometimes, if the values in a row of a table were changed, you need to know what the original values were along with who changed them. Into this space we introduce SQL Server 2008 Change Data Capture (CDC).
This new feature of SQL Server records Data Modification Language operations (INSERT, UPDATE, DELETE) on user tables. The changes are exposed using table-valued functions. The CDC asynchronously reads changes in the source tables from the transaction log and inserts those changes into separate change tables. This feature is available in Enterprise, Developer, and Evaluation editions only.

CDC Diagram

Before we can use this feature we have to configure the database for it. A member of the sysadmin server role must enable CDC on the database by running the sys.sp_cdc_enable_db stored procedure. You can also enable CDC for individual tables only by running sys.sp_cdc_enable_db_change_data_capture.

--First, let's create a demonstration database with a table in it
USE [master];
GO
IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = N'DemoDB')
    CREATE DATABASE [MyDB];
GO
USE [MyDB];
GO
CREATE TABLE MyTable
(MyID INT PRIMARY KEY NOT NULL,
 MyItem VARCHAR(128),
 MyQty INT);
 GO
 
--Next, let's enable CDC on the database
EXEC sys.sp_cdc_enable_db;
GO
--Enable CDC for the Products tableEXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'MyTable',
@role_name     = NULL,
@capture_instance = N'MyTbl_1',
@captured_column_list = N'MyID, MyItem, MyQty',
@filegroup_name = N'PRIMARY',
@supports_net_changes = 1
GO


When you run the two system stored procedures you will see results like these;

Job 'cdc.MyDB_capture' started successfully.
Job 'cdc.MyDB_cleanup' started successfully.

Notice that two jobs are created to monitor/capture and to cleanup the data capture tables.
Now, we have our table, let's put some data into it and modify that data so we can see CDC in action.

--Now let's add some initial data INSERT INTO MyTable(MyID,MyItem,MyQty) Values(1,'Item 1',20),(2,'Item 2',10),(3,'Item 3',1),(4,'Item 4',3); GO--And change it

UPDATE MyTable SET MyQty+=10 WHERE MyID=1; GO Command(s) completed successfully.


Now, when we enabled CDC for the database and the table in it, SQL Server created a pair of Table Valued Functions for us. The functions are named as cdc.fn_get_all_changes_'capture_instance', where 'capture_instance' is the value we specified for @capture_instance in our sp call to enable cdc on the table. In our case they are cdc.fn_cdc_get_all_changes_MyTbl_1. There are also several built-in functions for CDC that make querying information about CDC tables and instances easier. (See BOL for details). We will use two of the built-in functions in our example below.
Here they are in action;
DECLARE @from_lsn binary(10), @to_lsn binary(10);
SELECT @from_lsn = sys.fn_cdc_get_min_lsn(N'MyTbl_1');
SELECT @to_lsn = sys.fn_cdc_get_max_lsn();
SELECT * FROM cdc.fn_cdc_get_all_changes_MyTbl_1(@from_lsn,@to_lsn,N'all');
SELECT * FROM cdc.fn_cdc_get_net_changes_MyTbl_1(@from_lsn,@to_lsn,N'all');
GO

And the results from the above query;
 image
Notice the difference in the results from the two functions. The first function tracks history for every row in the table. It shows Item 1 as it was insert and again after it's update. The second function however, doesn't show the history, but just the net changes. Notice that the initial insert value for Item 1 is missing, only the current value is shown.

To read more on this topic check out the section in SQL Server 2008 Books Online. See all the available system functions associated with it. BOL topic is Change Data Capture.

Until next time, happy slinging.