Member-only story

Date: Jan 3rd , 2024
Hi everyone, this post is to introduce CDC concepts and technical aspects.
Change Data Capture (CDC) is a crucial concept in system design, especially when dealing with large-scale data systems. Let’s break down CDC topics into the four parts.
“Disclaimer: The views and opinions expressed in this blog post are solely my own and do not reflect those of any entity with which I have been, am now, or will be affiliated. This content was written during a period in which the author was not affiliated with nor belong to any organization that could influence their perspectives. As such, these are author’s personal insights, shared without any external bias or influence.”
1.High level concepts
Change Data Capture (CDC) is a design pattern used to identify and capture changes made to data in a database. The core idea is to efficiently track changes (like insertions, updates, deletions) in a data source, so these changes can be acted upon in near real-time.
There are several methods to implement CDC:
- Database Triggers: Custom triggers in the database that capture changes and log them to a separate table.
- Log-Based CDC: Capturing changes from the database’s transaction log (like the redo log in Oracle, or the binlog in MySQL).
- Polling-Based CDC: Regularly querying the database to check for new or changed data. Usually not recommended due to inefficient resource usage.
I will explain them later sections.
2.When Do We Need CDC
CDC is particularly useful in scenarios such as:
- Real-Time Analytics and Monitoring: When up-to-date data is required for immediate analysis.
- Replicating Data: To keep data in sync across multiple databases or data stores.
- Micro-services Architectures: Because there are multiple independent web services running simultaneously, for ensuring data consistency across different services.
In the next section, I will cover examples in detail.