Data Analytics

Step-by-Step Guide to BigQuery Data Pipeline Automation

21 Feb 2025

This guide provides a detailed walkthrough of creating and automating a data pipeline with BigQuery. We’ll use a simplified example of loading website traffic data (simulated) from Cloud Storage, transforming it, and loading it into BigQuery.

1. Setting up your Google Cloud Project

If you don’t have one already, create a Google Cloud Project. This will be the container for your BigQuery datasets, Cloud Storage buckets, and other resources.
Enable the necessary APIs: BigQuery API, Cloud Storage API, and (if you plan to use Dataflow or Composer later) the Dataflow API and Composer API.

2. Creating a Cloud Storage Bucket

Navigate to Cloud Storage in the Google Cloud Console.
Create a new bucket. Choose a name (e.g., website-traffic-data) and a region (choose a region close to your BigQuery dataset for optimal performance). Consider storage class based on how frequently you’ll access the data (e.g., Standard, Nearline, Coldline, Archive).

3. Preparing your Data (Simulated Website Traffic Data)

For this example, we’ll simulate website traffic data. Create a CSV file (e.g., traffic.csv) with the following structure (you can use a text editor or spreadsheet software):

Code snippet

timestamp,page_url,user_id,country

2023-10-26 10:00:00,/,user123,US

2023-10-26 10:05:00,/products,user456,CA

2023-10-26 10:10:00,/,user123,US

2023-10-26 10:15:00,/about,user789,UK

2023-10-26 10:20:00,/products,user456,CA

Upload the traffic.csv file to your Cloud Storage bucket (website-traffic-data).

4. Creating a Bigquery dataset

Navigate to BigQuery in the Google Cloud Console.
Create a new dataset. Please give it a name (e.g., website_analytics) and choose a location (it’s best practice to choose the same region as your Cloud Storage bucket).

5. Creating a Bigquery and Loading Data (using the UI)

In your BigQuery dataset (website_analytics), create a new table.
For “Source,” select “Cloud Storage.”
Enter the URI of your CSV file in Cloud Storage: gs://website-traffic-data/traffic.csv
For “File format,” select “CSV.”
In the “Schema” section, define the schema of your table manually or let BigQuery auto-detect it. It should look something like this:

timestamp:TIMESTAMP
page_url:STRING
user_id:STRING
country:STRING

Click “Create table.” BigQuery will load the data from your CSV file into the new table.

6. Querying the Data

After the data is loaded, you can query it using BigQuery’s SQL editor. For example:

SQL

SELECT COUNT(*) FROM `website_analytics.your_table_name`; — Replace your_table_name

SELECT country, COUNT(*) AS visit_count FROM `website_analytics.your_table_name` GROUP BY country;

7. Automation with Scheduled Queries(Basic Automation)

If you need to perform transformations or aggregations regularly, you can use scheduled queries.
Write a SQL query that performs the transformations you need (e.g., aggregating visits by page).
In the BigQuery UI, schedule this query to run daily, hourly, or at other intervals. The results of the query can be written to a new table.

8. Automating with Cloud Composer(Advanced Automation)

For more complex pipelines, use Cloud Composer. This involves:
- Creating a Cloud Composer environment.
- Writing a Directed Acyclic Graph (DAG) using Python and the Airflow library. The DAG will define the steps of your pipeline (e.g., extracting data, transforming it with Dataflow or SQL, loading it into BigQuery).
- Deploying the DAG to your Cloud Composer environment. Composer will then schedule and execute your pipeline according to the defined schedule.
Example DAG (Conceptual):

Python

from airflow import DAG

from airflow.operators.bash import BashOperator

from airflow.providers.google.cloud.operators.bigquery import BigQueryLoadOperator

from datetime import datetime

with DAG(

dag_id=‘website_traffic_pipeline’,

start_date=datetime(2023, 10, 27),

schedule_interval=‘@daily’, # Run daily

catchup=False

) as dag:

# Example: Transforming data using a Bash script (replace with Dataflow if needed)

transform_data = BashOperator(

task_id=‘transform_data’,

bash_command=‘bq query –use_legacy_sql=false “SELECT * FROM `website_analytics.your_table_name`”‘ # Replace with your transformation logic

)

load_to_bigquery = BigQueryLoadOperator(

task_id=‘load_to_bigquery’,

source_objects=[f”gs://website-traffic-data/transformed_data.csv”], # Replace with your transformed data location

destination_project_dataset_table=f”website_analytics.transformed_traffic_data”, # Replace with your target table

schema_fields=[{‘name’: ‘timestamp’, ‘type’: ‘TIMESTAMP’}, {‘name’: ‘page_url’, ‘type’: ‘STRING’}, {‘name’: ‘user_id’, ‘type’: ‘STRING’}, {‘name’: ‘country’, ‘type’: ‘STRING’}],

write_disposition=‘WRITE_TRUNCATE’ # Or ‘WRITE_APPEND’

)

transform_data >> load_to_bigquery # Define task dependencies

9. Monitoring and Logging

Use Cloud Logging and Cloud Monitoring to monitor the performance of your pipeline and troubleshoot any issues.

This step-by-step guide provides a foundation for building and automating data pipelines with BigQuery. Remember to adapt the example to your specific data sources, transformations, and analysis requirements. For more complex scenarios, Dataflow and Cloud Composer are essential tools for building robust and scalable pipelines.

Our knowledge center

Keep up-to-date with the latest digital insights and resources.

Step-by-Step Guide to BigQuery Data Pipeline Automation

1. Setting up your Google Cloud Project

2. Creating a Cloud Storage Bucket

3. Preparing your Data (Simulated Website Traffic Data)

4. Creating a Bigquery dataset

5. Creating a Bigquery and Loading Data (using the UI)

6. Querying the Data

7. Automation with Scheduled Queries(Basic Automation)

8. Automating with Cloud Composer(Advanced Automation)

9. Monitoring and Logging

Contents

Recent Posts

The Ultimate BFCM 2025 Playbook for eCommerce Brands

Canonical URLs in 2025: Best Practices for Search Visibility

10 Common CRO Mistakes and How to Avoid Them

What is a 301 Redirect? (Complete Guide with Examples & SEO Tips)

Case Studies

Business News Australia

On Track Meals

Gama Healthcare

Heart Foundation

Let’s Make Sense of Your Digital Challenges!

Suncorp

Let's Make Your Data Work for You