With Amplitude's BigQuery integration, you can ingest BigQuery data directly into your Amplitude project.
Amplitude is working on a beta version of BigQuery Import specifically for GA4. To join this beta, contact Amplitude Support.
To get started with importing from BigQuery, you need to take care of a few prerequisites.
You need a table (or tables) in BigQuery. This is where you want to import data from.
Create a GCS bucket. Amplitude recommends one dedicated to this purpose. The ingestion process must offload data to a GCS bucket before ingesting it into Amplitude. This is due to BigQuery's limited export options.
Create a Service Account with permissions granted for the bucket and tables you want to ingest, then get the service account key. The Service Account must have the following roles granted to it:
Depending on your company's network policy, you may need to add the following IP addresses to your allowlist to allow Amplitude's servers to access your BigQuery instance:
Amplitude's Data Warehouse Import sometimes processes events in parallel, so time-ordered syncing of user and group properties on events is not guaranteed in the same way as submitting events directly to the Identify and Group Identify APIs.
To add BigQuery as a data source in your Amplitude project, follow these steps.
If you have any issues or questions while following this flow, contact the Amplitude team.
For Amplitude's time-based import option, it's best practice to use a monotonically increasing timestamp value. This value should indicate when the record was loaded into the source table the SQL configuration is querying from (often referred to as a "server upload time"). The warehouse import tool brings data into Amplitude is by continually updating the maximum value of the column referenced in the Timestamp Column Name input within the Import Config UI with each subsequent import.
Upon first import, Amplitude imports all the data returned from the query configured in the Import Config. Amplitude saves a reference of the maximum timestamp referenced in the Timestamp Column Name: timestamp_1
. Upon subsequent import, Amplitude imports all data from the previously saved timestamp (timestamp_1
), to what's now the new maximum timestamp (timestamp_2
). Then after that import, Amplitude saves timestamp_2
as the new maximum timestamp.
Include the mandatory fields for the data type when you create the SQL query. These tables outline the mandatory and optional fields for each data type. Find a list of other supported fields for events in the HTTP V2 API documentation and for user properties in the Identify API documentation. Add any columns not in those lists to either event_properties
or user_properties
, otherwise it's ignored.
Column name (must be lowercase) | Mandatory | Column data type | Example |
---|---|---|---|
user_id |
Yes, unless device_id is used |
VARCHAR | datamonster@gmail.com |
device_id |
Yes, unless user_id is used |
VARCHAR | C8F9E604-F01A-4BD9 |
event_type |
Yes | VARCHAR | watch_tutorial |
time |
Yes | Milliseconds since epoch (Timestamp) | 1396381378123 |
event_properties |
Yes | JSON | {"source":"notification", "server":"host-us"} |
user_properties |
No | JSON | {"city":"chicago", "gender":"female"} |
update_time_column |
No (Yes if using time based import) | TIMESTAMP | 2013-04-05 01:02:03.000 |
Find other supported fields in the HTTP V2 API documentation.
Column name (must be lowercase) | Mandatory | Column data type | Example |
---|---|---|---|
user_id |
Yes | VARCHAR | datamonster@gmail.com |
user_properties |
Yes | JSON | {"city":"chicago", "gender":"female"} |
update_time_column |
No (Yes if using time based import) | TIMESTAMP | 2013-04-05 01:02:03.000 |
Find other supported fields in the Identify API documentation.
Column name (must be lowercase) | Mandatory | Column data type | Example |
---|---|---|---|
groups |
Yes | JSON | {"company":"amplitude", "team":["marketing", "sales"]} |
group_properties |
Yes | JSON | {"location":"seattle", "active":"true"} |
update_time_column |
No (Yes if using time based import) | TIMESTAMP | 2013-04-05 01:02:03.000 |
Each group property in group_properties
would be applied to every group in groups
To update the Service Account used for your BigQuery Source, select the existing BigQuery Source within the Sources section of Data and click the ⚙️ (gear) icon. Within the modal, upload the new Service Account Key you want Amplitude to use moving forward.
Before you update your Service Account Key, ensure the new Service Account Key has the proper data access to ensure Amplitude can successfully import any relevant data.
Many Amplitude features are powered by "properties" fields, which are composed of property keys and property values. The most common of these properties fields are event_properties
and user_properties
.
In order for these sets of keys and values to be correctly ingested into Amplitude, they must be exported from BigQuery as raw JSON, not as JSON strings. BigQuery doesn't have great support for JSON, but the following describes how to make sure your data is exported from BigQuery and imported to Amplitude without errors.
The properties fields are sourced from columns with a STRUCT type. The struct type is the field type that represents a key-value structure and is exported from BigQuery in raw JSON format.
If your source table doesn't have the event or user properties organized in a struct type column, you can create it in your select SQL. For example, if your event properties are all flattened into their own columns, you can compose your event_properties
into a struct like so:
1SELECT STRUCT(2 event_property_column_1 AS event_property_name_1,3 event_property_column_2 AS event_property_name_24) as event_properties5FROM your_table;
You can't have spaces in struct field names even if they are enclosed in back ticks or single quotes.
If you have your event or user properties formatted as JSON as a string field, you still must reconstruct the properties field in the select SQL as a STRUCT. BigQuery exports String fields as String even if the contents are JSON. Amplitude's event validation rejects these.
You can extract values from your JSON String field, though, to use in your properties STRUCT. Use the JSON_EXTRACT_SCALAR function to access the values in your string as follows. If your EVENT_PROPERTIES column in the table contains a JSON String like:
"{\"record count\":\"50\",\"region\":\"eu-central-1\"}"
which is shown in the BigQuery UI like {"record count":"50","region":"eu-central-1"})
, then you can extract the values from the JSON String like this:
1SELECT STRUCT(2 JSON_EXTRACT_SCALAR(EVENT_PROPERTIES, "$.record count") AS record_count,3 JSON_EXTRACT_SCALAR(EVENT_PROPERTIES, "$.region") AS region4) as event_properties5FROM your_table;
Be aware that, unlike other data warehouse products, BigQuery treats "double-quoted strings" as string literals.
This means that you can't use them to quote identifiers like column names or table names, or the SQL fails to execute in BigQuery.
Thanks for your feedback!
April 22nd, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.