Frequently Asked Questions (FAQ)
How does the connector handle deletes, and what are the performance implications?
This connector writes data to Iceberg tables using the V2 specification. To optimize write performance, delete events are recorded in delete files, avoiding costly data file rewrites. While this approach significantly improves write performance, it can impact read performance, especially in upsert
mode. In append
mode, this performance trade-off is not applicable.
To optimize read performance, you must run periodic table maintenance jobs to compact data and rewrite the delete files. This is especially critical for upsert
mode.
Does the connector support schema evolution?
Full schema evolution, such as converting incompatible data types, is not currently supported. However, schema expansion—including adding new fields or promoting field data types—is supported. To enable this behavior, set the debezium.sink.iceberg.allow-field-addition
configuration property to true
.
For a more robust way to handle schema changes, you can configure the connector to store all nested data in a variant
field. This approach can seamlessly absorb many schema changes.
How can I replicate only specific tables from my source database?
By default, the Debezium connector replicates all tables in the database, which can result in unnecessary load. To avoid replicating tables you don't need, configure the debezium.source.table.include.list
property to specify the exact tables to replicate. This will streamline your data pipeline and reduce overhead. For more details, refer to the [Debezium server source](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-property-table-include-list documentation.
How do I configure AWS S3 credentials?
You can set up AWS credentials in one of the following ways:
- In
application.properties
: Use thedebezium.sink.iceberg.fs.s3a.access.key
anddebezium.sink.iceberg.fs.s3a.secret.key
properties. - As environment variables: Set
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - Using Hadoop's configuration: Set up the
HADOOP_HOME
environment variable and add S3A configuration tocore-site.xml
. More information can be found here.