Skip to content

bigquerystream Consumer

Streams debezium events to Bigquery using the Storage Write API.

Configuration

Config Default Description
debezium.sink.bigquerystream.dataset Destination Bigquery dataset name
debezium.sink.bigquerystream.location US Bigquery table location
debezium.sink.bigquerystream.project Bigquery project
debezium.sink.bigquerystream.ignore-unknown-fields true if true, unknown Json fields to BigQuery will be ignored instead of error out.
debezium.sink.bigquerystream.create-if-needed true Creates Bigquery table if not found
debezium.sink.bigquerystream.partition-field __ts_ms Partition target tables by __ts_ms field
debezium.sink.bigquerystream.clustering-field __source_ts_ms Cluster target tables by PK + __source_ts_ms field
debezium.sink.bigquerystream.partition-type MONTH Partitioning type
debezium.sink.bigquerystream.allow-field-addition false Allow field addition to target tables
debezium.sink.bigquerystream.credentials-file GCP service account credentialsFile
debezium.sink.bigquerystream.bigquery-custom-host Custom endpoint for BigQuery API. Useful for testing against a local BigQuery emulator like bq-emulator.
debezium.sink.bigquerystream.bigquery-custom-grpc-host Custom endpoint for BigQuery GRPC API. Useful for testing against a local BigQuery emulator like bq-emulator.
debezium.sink.bigquerystream.bigquery-dev-emulator false Whether or not Debezium should connect to bq-emulator instance.
debezium.sink.bigquerystream.upsert false Running upsert mode overwriting updated rows. Using Bigquery CDC feature
debezium.sink.bigquerystream.upsert-keep-deletes true With upsert mode, keeps deleted rows in bigquery table.
debezium.sink.bigquerystream.upsert-dedup-column __source_ts_ns With upsert mode used to deduplicate data. row with highest __source_ts_ns is kept.
debezium.sink.bigquerystream.upsert-op-column __op Used with upsert mode to deduplicate data when __source_ts_ns of rows are same.
debezium.sink.bigquerystream.cast-deleted-field false Cast deleted field to boolean type(by default it is string type)

Upsert

By default, Bigquery Streaming consumer is running with append mode debezium.sink.bigquerystream.upsert=false. Upsert mode uses source Primary Key and does upsert on target table(delete followed by insert). For the tables without Primary Key consumer falls back to append mode.

Upsert Mode Data Deduplication

With upsert mode data deduplication is done. Deduplication is done based on __source_ts_ns value and event type __op . its is possible to change this field using debezium.sink.bigquerystream.upsert-dedup-column=__source_ts_ns (Currently only Long field type supported.)

Operation type priorities are {"c":1, "r":2, "u":3, "d":4}. When two records with same key and same __source_ts_ns values received then the record with higher __op priority is kept and added to destination table and duplicate record is dropped from stream.