GCP BigQuery

Arun Battepati
2 min readMay 4, 2023

--

BigQuery’s serverless architecture decouples storage and compute and allows them to scale independently on demand. This structure offers both immense flexibility and cost controls for customers because they don’t need to keep their expensive compute resources up and running all the time. This is very different from traditional node-based cloud data warehouse solutions or on-premise massively parallel processing (MPP) systems. This approach also allows customers of any size to bring their data into the data warehouse and start analyzing their data using Standard SQL without worrying about database operations and system engineering

What is Big Query:

Big Query is Google Cloud’s fully managed and serverless database

Scalability: BigQuery is a highly scalable data warehouse that can handle petabyte-scale data. It is designed to accommodate growing data volumes without the need for manual intervention or performance tuning.

Speed: BigQuery offers lightning-fast query performance, as it’s a distributed architecture and columnar storage format.

Cost-effective: BigQuery is a fully managed, serverless platform that eliminates the need for infrastructure management. It allows businesses to pay only for what they use, which makes it an affordable option for organizations of all sizes.

Integration: BigQuery integrates seamlessly with other Google Cloud Platform (GCP) services, such as Google Data Studio and Google Cloud Storage, making it easy to build end-to-end analytics pipelines.

Big Query Features:

Serverless and easy to setup, No infra

Realtime data Analytics — Live stream from different sources like SaaS, IoT, Data flow into BigQuery

High Availability — 99.99% commit by google

Standard SQL

Flexible pricing models

Public datasets

Easy operations and maintenance

Big Query Pricing

The cost of storing 1TB of data in Cloud Storage is $0.026 per gigabyte per month, which means that 1TB of data storage will cost $26 per month

Demo Queries:

============================

SELECT

state,

SUM(confirmed_cases) AS total_cases,

SUM(deaths) AS total_deaths

FROM

`bigquery-public-data.covid19_usafacts.summary`

GROUP BY

state

ORDER BY

total_cases DESC

LIMIT

10

============================

SELECT

tag_name,

SUM(view_count) AS total_views

FROM

`bigquery-public-data.stackoverflow.posts_questions`

JOIN UNNEST(SPLIT(tags, ‘|’)) AS tag_name

WHERE

tag_name IN (‘java’, ‘python’, ‘javascript’, ‘c#’, ‘php’, ‘html’, ‘css’, ‘mysql’, ‘android’, ‘jquery’)

GROUP BY

tag_name

ORDER BY

total_views DESC

LIMIT

10

--

--