Google BigQuery
Server-side event streaming to Google BigQuery via the Storage Write API for low-latency analytics, machine learning workloads, and data warehousing. The @walkeros/server-destination-gcp package also ships destinationPubSub for publishing events to Pub/Sub topics; this page covers BigQuery only.
GCP BigQuery is a server destination in the walkerOS flow:
Streams events to Google BigQuery for data warehousing, analytics dashboards, and machine learning workloads.
Installation
- Integrated
- Bundled
Configuration
This destination uses the standard destination config wrapper (consent, data, env, id, ...). For the shared fields see destination configuration. Package-specific fields live under config.settings and are listed below.
Settings
| Property | Type | Description | More |
|---|---|---|---|
client | any | Google Cloud BigQuery client instance | |
projectId | string | Google Cloud Project ID | |
datasetId | string | BigQuery dataset ID where events will be stored | |
tableId | string | BigQuery table ID for event storage | |
location | string | Geographic location for the BigQuery dataset | |
bigquery | any | Additional BigQuery client configuration options |
Mapping
This package does not define custom rule-level settings. For the standard rule fields (consent, condition, data, batch, name, policy) see mapping.
Examples
Page view
A page view is appended as one row through the BigQuery Storage Write API JSONWriter. Nested objects/arrays in data, source, etc. are JSON-stringified by eventToRow.
Purchase
An order event is appended as a single row through JSONWriter.appendRows. The entire nested data object (including arrays like items) is JSON-stringified into the data column via eventToRow().
Prerequisites
- Google Cloud account with billing enabled
- gcloud CLI installed and authenticated (includes
bqcommand)
Setup lifecycle
Provision the dataset and table once per environment with the CLI:
Output: a narrated setup: ok destination.bigquery line. Add --json to also
emit a structured envelope reporting { datasetCreated, tableCreated } for jq
piping. The command is idempotent, safe to re-run.
config.setup controls provisioning:
- omitted or
false: narrated skip, no provisioning. Operator runs setup explicitly to provision. true: provision with the defaults below.- object matching the
Setupinterface: provision with the declared overrides.
See the Setup interface in the package for the full shape.
Defaults
| Field | Value |
|---|---|
datasetId | walkerOS (note capital O, S) |
tableId | events |
location | EU |
storageBillingModel | PHYSICAL (cheaper for compressible JSON) |
| Partitioning | Day partitioning on timestamp |
| Clustering | (name, entity, action) |
Physical storage billing charges based on compressed size. Day partitioning and
the (name, entity, action) clustering reduce scan costs for typical analytics
queries. Always include a timestamp filter.
Drift handling
If the existing table's partitioning, clustering, or schema differs from the
declared configuration, setup logs WARN setup.drift {...} and continues. There
is no auto-mutation. Migrations are an operator decision.
GCP setup
Enable BigQuery API
Create service accounts
The provisioning step (setup) and the runtime push path need different permissions. We recommend separating them.
Operator (setup) service account, used by walkeros setup:
bigquery.datasets.createbigquery.tables.createbigquery.datasets.get(for drift detection)bigquery.tables.get
Runtime service account, used by the running flow:
bigquery.tables.updateData(Storage Write API append)
Authentication
- Service Account Key
- Workload Identity
For environments where you need explicit credentials (Docker containers, external platforms):
Set the environment variable to use the key:
Keep key files secure. Never commit them to version control or include in public Docker images.
For GCP-native platforms (Cloud Run, GKE, Compute Engine), attach the service account directly to your workload. No key file needed.
Cloud Run example:
See Workload Identity documentation for other platforms.
Instead of relying on GOOGLE_APPLICATION_CREDENTIALS, you can pass auth
options inline via settings.bigquery (for example keyFilename or
credentials). These apply to both the control plane (setup, metadata) and the
data plane (Storage Write API ingestion). A pre-built settings.client
authenticates the control plane only; supply settings.bigquery for the data
plane to use non-ADC credentials.
Environment variables
| Variable | Description | Default |
|---|---|---|
GCP_PROJECT_ID | Your GCP project ID | Required |
BQ_DATASET | BigQuery dataset name | walkerOS |
BQ_TABLE | BigQuery table name | events |
BQ_LOCATION | BigQuery dataset location | EU |
GOOGLE_APPLICATION_CREDENTIALS | Path to service account key | Required (unless using Workload Identity) |
Storage Write API (data plane)
The destination uses BigQuery's
Storage Write API for data
ingestion. This replaces the legacy tabledata.insertAll path.
- Cost: $25/TB after the 2 TiB/month free tier (vs ~$50/TB for the legacy path). Most low-volume deployments fit entirely in the free tier.
- Batching:
pushBatchis implemented. Set the collector'sbatch: <ms>mapping setting to flush all events in a window as a singleappendRowscall.
The upstream @google-cloud/bigquery-storage package self-marks as
EXPERIMENTAL (subject to change). Pinned at ^5.1.0.
Default table schema
The default 15-column schema follows the
walkerOS Event v4 canonical order. Object
fields use the native JSON BigQuery type. Only name is REQUIRED; all
other columns are NULLABLE for resilience against partial events.
| Column | Type | Mode |
|---|---|---|
name | STRING | REQUIRED |
data | JSON | NULLABLE |
context | JSON | NULLABLE |
globals | JSON | NULLABLE |
custom | JSON | NULLABLE |
user | JSON | NULLABLE |
nested | JSON | NULLABLE |
consent | JSON | NULLABLE |
id | STRING | NULLABLE |
trigger | STRING | NULLABLE |
entity | STRING | NULLABLE |
action | STRING | NULLABLE |
timestamp | TIMESTAMP | NULLABLE |
timing | INT64 | NULLABLE |
source | JSON | NULLABLE |
There is no createdAt column. Use timestamp (event time) for partition
filters.
Partitioning by day on timestamp and clustering on (name, entity, action) reduces
scan costs for typical analytics queries. Always include a timestamp filter.
Custom schema mapping
You can send a custom schema by using the data configuration to map specific
fields. This is useful when you only need a subset of the event data.
Example: simple schema
This example sends only name, id, data, and timestamp:
- Integrated
- Bundled
With the corresponding simpler table:
Cleanup
To remove BigQuery resources:
- Delete the BigQuery dataset
- Remove service account IAM bindings from the dataset
- Delete the service account
- Remove any downloaded key files
Pub/Sub
The same @walkeros/server-destination-gcp package also exports destinationPubSub for publishing events to a Pub/Sub topic. See the Pub/Sub destination page for full settings, mapping, ordering, attributes, setup, and authentication reference.