Change Data Capture in SQL Server
Hello everyone and welcome to a new post on the best SQL Server blog. In today’s post we are going to talk about Change Data Capture, what it is, how it is configured, important details that we must take into account when using it, and useful queries on a day-to-day basis. So… let’s get started!
What is Change Data Capture?
Change Data Capture or CDC is a resource that allows recording the activity, within a database, when tables and rows are modified. CDC is available on Azure SQL Database, SQL Server, and Azure SQL Managed Instance.
How does it work?
Change Data Capture is located within the table in our database and allows us to keep traceability of the data from the moment we insert it, all the updates it undergoes and until it is finally deleted.
What is Change Data Capture useful for on a practical level?
The CDC serves us at the registration or historical level of the entire process that our data has gone through, from the time we entered it until it was deleted. We could say that it is the history of the data. I give you an example with a possible practical effect:
Let’s imagine that we are a textile company and we have saved the address of our partner. This partner, loyal to the company, has been associated for a long time and has always tried to keep his data updated. If we consult our database, we will see what the current address of the user is. Now we are going to check what happens if we analyze this same data through the CDC:
Insertion –> Address 1ª: Alicante
2 years later –> Update –> 2nd Address: Valencia
2 years later –> Update –> Address 3ª Madrid
15 years without further modifications
And now you will ask yourself, why am I interested in knowing where our partner lived, if I already know where he lives right now?
Let’s think big, as if it were an entrepreneur. We are going to process the same data but from all partners. What would be the result we would get?
In this case, we would see how the population associated with our company moves and how it is distributed over geography. This data can be useful when conducting market research to determine the best location for a new store. Knowing the distribution of our associates we will find the best location to obtain the desired level of sales.
How do you configure Change Data Capture in SQL Server?
The CDC is enabled both at the database level and at the table level. Therefore, the first step will be to enable it at the database level, and then at the table level. Let’s go with an example of enabling the CDC for a table for a database.
Use 'Database_name'
GO
EXEC sys.sp_cdc_enable_db
GO
- If the environment is Amazon RDS, the statement is as follows:
Use 'Database_name'
GO
EXEC msdb.dbo.rds_cdc_enable_db 'Database_name'
- We enable one by one the tables that we want to put in CDC.
exec sys.sp_cdc_enable_table
@source_schema = N'Schema_name',
@source_name =N'Table_name',
@role_name = NULL,
@supports_net_changes = 1
In the following link we can see all the parameters that we can adjust to our tables when configuring the CDC for them:
With this, we would already have our table configured and ready to save all the data.
Risks of using Change Data Capture in SQL Server
This technology requires having a great control of the log file. Why?
For this, we first have to understand how SQL Server works when executing transactions. In SQL Server, all transactions are first written to the log file, and then written to the data file. Once all the resources have finished with the transactions in the log file, they are released or marked to be able to release these transactions from the file.
In SQL Server, when someone connects to the CDC, it is in charge of retaining the records in the log file, waiting for the connected user to consult them, before releasing them or marking them to release. Let’s say you add a check to them, which you have to check before you can release them. This can cause these records to accumulate, so that the log file grows, and generate disk space problems if it is not properly sized, or we do not have it properly monitored.
Useful queries to work with the CDC.
1.To review the status of the Log of the databases with the CDC activated.
select name, log_reuse_wait_desc, is_cdc_enabled from sys.databases
2. On the databases that the CDC has active, we can consult which tables are included in the CDC.
select name,type,type_desc,is_tracked_by_cdc from sys.tables where is_tracked_by_cdc = 1
3.To review the CDC configuration status.
EXECUTE sys.sp_cdc_help_jobs
4. Change CDC settings.
EXECUTE sys.sp_cdc_change_job
@job_type = N'cleanup',
@retention = 30;
5. To review the operation of our CDC by database and the errors we can review the following system views.
select * from sys.dm_cdc_log_scan_sessions
- Here we will see all the sessions that have occurred, and in the first row the aggregate since the CDC was started.
select * from sys.dm_cdc_errors
- Disable CDC completely.
Use 'Database_name'
GO
EXEC sys.sp_cdc_disable_db
GO
- If the environment is Amazon RDS, the sentence is the following:
Use 'Database_name'
GO
EXEC msdb.dbo.rds_cdc_disable_db 'Database_name'
- Start and stop the jobs associated with the CDC.
exec sp_cdc_start_job
exec sp_cdc_stop_job
- Drop a CDC table in a database.
EXEC sys.sp_cdc_disable_table
@source_schema = N'Schema_name',
@source_name = N'Table_name',
@capture_instance = 'all';
Graphic illustration of the above example
First of all, we see how the client did not exist and that it is registered in Alicante:
After inserting the first record, within the internal table of the CDC this change has been implemented. Now we are going to try to update this value from Alicante to Valencia.
We see how the change has been registered, saving the previous state ($operation =3) and later. ($operation =4).
Now we will change the province again, as it says in the example, and then we will delete the record as an example, although this is not part of the previous example.
We see how it happened like the previous example, but with Madrid, and then we have deleted the record (_$operation =1).
External resources using Change Data Capture
AWSDMS. We see that Amazon Web Service (AWS) has a service called Database Migration Service DMS that relies on the CDC to be able to extract the data and migrate it.
https://aws.amazon.com/es/dms/
And here we finish this post. I hope it has helped you to improve the tasks in SQL Server.
Visit our latest posts:
Avoid problems in your SQL Server Migration with Data Migration Assistant (DMA)
Know the Waiting Stadistics or Waits for SQL Server
Learning to organize SQL Server Management Studio for different environments
If you need to improve the performance of your SQL Server database, contact us:
Data Engineer.