Google BigQuery

Server Source code Package

Server-side event streaming to Google BigQuery via the Storage Write API for low-latency analytics, machine learning workloads, and data warehousing. The @walkeros/server-destination-gcp package also ships destinationPubSub for publishing events to Pub/Sub topics; this page covers BigQuery only.

Where this fits

GCP BigQuery is a server destination in the walkerOS flow:

Streams events to Google BigQuery for data warehousing, analytics dashboards, and machine learning workloads.

Installation

Integrated
Bundled

Add to your flow.json destinations:

CLI reference

Configuration

This destination uses the standard destination config wrapper (consent, data, env, id, ...). For the shared fields see destination configuration. Package-specific fields live under config.settings and are listed below.

Settings

Property	Type	Description
`client`	`any`	Google Cloud BigQuery client instance
`projectId*`	`string`	Google Cloud Project ID
`datasetId`	`string`	BigQuery dataset ID where events will be stored
`tableId`	`string`	BigQuery table ID for event storage
`location`	`string`	Geographic location for the BigQuery dataset
`bigquery`	`any`	Additional BigQuery client configuration options

* Required fields

Mapping

This package does not define custom rule-level settings. For the standard rule fields (consent, condition, data, batch, name, policy) see mapping.

Examples

Page view

A page view is appended as one row through the BigQuery Storage Write API JSONWriter. Nested objects/arrays in data, source, etc. are JSON-stringified by eventToRow.

Event

Out

Purchase

An order event is appended as a single row through JSONWriter.appendRows. The entire nested data object (including arrays like items) is JSON-stringified into the data column via eventToRow().

Event

Out

Prerequisites

Google Cloud account with billing enabled
gcloud CLI installed and authenticated (includes bq command)

Setup lifecycle

Provision the dataset and table once per environment with the CLI:

Output: a narrated setup: ok destination.bigquery line. Add --json to also emit a structured envelope reporting { datasetCreated, tableCreated } for jq piping. The command is idempotent, safe to re-run.

config.setup controls provisioning:

omitted or false: narrated skip, no provisioning. Operator runs setup explicitly to provision.
true: provision with the defaults below.
object matching the Setup interface: provision with the declared overrides.

See the Setup interface in the package for the full shape.

Defaults

Field	Value
`datasetId`	`walkerOS` (note capital O, S)
`tableId`	`events`
`location`	`EU`
`storageBillingModel`	`PHYSICAL` (cheaper for compressible JSON)
Partitioning	Day partitioning on `timestamp`
Clustering	`(name, entity, action)`

Cost optimization

Physical storage billing charges based on compressed size. Day partitioning and the (name, entity, action) clustering reduce scan costs for typical analytics queries. Always include a timestamp filter.

Drift handling

If the existing table's partitioning, clustering, or schema differs from the declared configuration, setup logs WARN setup.drift {...} and continues. There is no auto-mutation. Migrations are an operator decision.

GCP setup

Enable BigQuery API

Create service accounts

The provisioning step (setup) and the runtime push path need different permissions. We recommend separating them.

Operator (setup) service account, used by walkeros setup:

bigquery.datasets.create
bigquery.tables.create
bigquery.datasets.get (for drift detection)
bigquery.tables.get

Runtime service account, used by the running flow:

bigquery.tables.updateData (Storage Write API append)

Authentication

Service Account Key
Workload Identity

For environments where you need explicit credentials (Docker containers, external platforms):

Set the environment variable to use the key:

caution

Keep key files secure. Never commit them to version control or include in public Docker images.

Inline credentials

Instead of relying on GOOGLE_APPLICATION_CREDENTIALS, you can pass auth options inline via settings.bigquery (for example keyFilename or credentials). These apply to both the control plane (setup, metadata) and the data plane (Storage Write API ingestion). A pre-built settings.client authenticates the control plane only; supply settings.bigquery for the data plane to use non-ADC credentials.

Environment variables

Variable	Description	Default
`GCP_PROJECT_ID`	Your GCP project ID	Required
`BQ_DATASET`	BigQuery dataset name	`walkerOS`
`BQ_TABLE`	BigQuery table name	`events`
`BQ_LOCATION`	BigQuery dataset location	`EU`
`GOOGLE_APPLICATION_CREDENTIALS`	Path to service account key	Required (unless using Workload Identity)

Storage Write API (data plane)

The destination uses BigQuery's Storage Write API for data ingestion. This replaces the legacy tabledata.insertAll path.

Cost: $25/TB after the 2 TiB/month free tier (vs ~$50/TB for the legacy path). Most low-volume deployments fit entirely in the free tier.
Batching: pushBatch is implemented. Set the collector's batch: <ms> mapping setting to flush all events in a window as a single appendRows call.

EXPERIMENTAL SDK

The upstream @google-cloud/bigquery-storage package self-marks as EXPERIMENTAL (subject to change). Pinned at ^5.1.0.

Default table schema

The default 15-column schema follows the walkerOS Event v4 canonical order. Object fields use the native JSON BigQuery type. Only name is REQUIRED; all other columns are NULLABLE for resilience against partial events.

Column	Type	Mode
`name`	STRING	REQUIRED
`data`	JSON	NULLABLE
`context`	JSON	NULLABLE
`globals`	JSON	NULLABLE
`custom`	JSON	NULLABLE
`user`	JSON	NULLABLE
`nested`	JSON	NULLABLE
`consent`	JSON	NULLABLE
`id`	STRING	NULLABLE
`trigger`	STRING	NULLABLE
`entity`	STRING	NULLABLE
`action`	STRING	NULLABLE
`timestamp`	TIMESTAMP	NULLABLE
`timing`	INT64	NULLABLE
`source`	JSON	NULLABLE

There is no createdAt column. Use timestamp (event time) for partition filters.

Query optimization

Partitioning by day on timestamp and clustering on (name, entity, action) reduces scan costs for typical analytics queries. Always include a timestamp filter.

Custom schema mapping

You can send a custom schema by using the data configuration to map specific fields. This is useful when you only need a subset of the event data.

Example: simple schema

This example sends only name, id, data, and timestamp:

Integrated
Bundled

With the corresponding simpler table:

Cleanup

To remove BigQuery resources:

Delete the BigQuery dataset
Remove service account IAM bindings from the dataset
Delete the service account
Remove any downloaded key files

Pub/Sub

The same @walkeros/server-destination-gcp package also exports destinationPubSub for publishing events to a Pub/Sub topic. See the Pub/Sub destination page for full settings, mapping, ordering, attributes, setup, and authentication reference.

💡 Need implementation support?

elbwalker offers hands-on support: setup review, measurement planning, destination mapping, and live troubleshooting. Book a 2-hour session (€399)

Installation​

Configuration​

Settings​

Mapping​

Examples

Page view

Purchase

Prerequisites​

Setup lifecycle​

Defaults​

Drift handling​

GCP setup​

Enable BigQuery API​

Create service accounts​

Authentication​

Environment variables​

Storage Write API (data plane)​

Default table schema​

Custom schema mapping​

Example: simple schema​

Cleanup​

Pub/Sub​