We’ve started using Google BigQuery extensively at Cruise as a data warehouse. The syntax for querying arrays in BigQuery isn’t obvious and this post will explain how it works.
Consider a table that stores a company and its executives with the following schema:
companies
- name
- founding_year
- executives
* name
* title
with some sample data
WITH companies AS (
SELECT
"Apple" as name, 1976 as founding_year,
[
(SELECT AS STRUCT 'Tim Cook' name, 'CEO' title),
(SELECT AS STRUCT 'Jony Ive' name, 'Chief Design Officer' title),
(SELECT AS STRUCT 'Jeff Williams' name, 'COO' title)
] as executives UNION ALL SELECT
"Amazon" as name, 1994 as founding_year,
[
(SELECT AS STRUCT 'Jeff Bezos' name, 'CEO' title),
(SELECT AS STRUCT 'Brian T. Olsavsky' name, 'CFO' title)
] as executives UNION ALL SELECT
"Twitter" as name, 2006 as founding_year,
[
(SELECT AS STRUCT 'Jack Dorsey' name, 'CEO' title),
(SELECT AS STRUCT 'Ned Segal' name, 'CFO' title)
] as executives UNION ALL SELECT
"AirBNB" as name, 2008 as founding_year,
[
(SELECT AS STRUCT 'Brian Chesky' name, 'CEO' title),
(SELECT AS STRUCT 'Joe Gebbia' name, 'CPO' title)
] as executives UNION ALL SELECT
"Square" as name, 2009 as founding_year,
[
(SELECT AS STRUCT 'Jack Dorsey' name, 'CEO' title)
] as executives
) SELECT * FROM companies;
Row | name | founding_year | executives.name | executives.title |
---|---|---|---|---|
1 | Apple | 1976 | Tim Cook | CEO |
Jony Ive | Chief Design Officer | |||
Jeff Williams | COO | |||
2 | Amazon | 1994 | Jeff Bezos | CEO |
Brian T. Olsavsky | CFO | |||
3 | 2006 | Jack Dorsey | CEO | |
Ned Segal | CFO | |||
4 | AirBNB | 2008 | Brian Chesky | CEO |
Joe Gebbia | CPO | |||
5 | Square | 2009 | Jack Dorsey | CEO |
We can use familiar SQL queries to query top level fields like name
or founding_year
SELECT * FROM companies WHERE founding_year > 2000;
Row | name | founding_year | executives.name | executives.title |
---|---|---|---|---|
1 | 2006 | Jack Dorsey | CEO | |
Ned Segal | CFO | |||
2 | AirBNB | 2008 | Brian Chesky | CEO |
Joe Gebbia | CPO | |||
3 | Square | 2009 | Jack Dorsey | CEO |
To query nested fields like executives.name
or executives.title
, we use a combination of CROSS JOIN
and UNNEST
.
For example, this query returns all companies with Jack Dorsey as an executive:
SELECT * FROM companies CROSS JOIN UNNEST(executives) as executive WHERE executive.name = 'Jack Dorsey';
Row | name | founding_year | executives.name | executives.title |
---|---|---|---|---|
1 | 2006 | Jack Dorsey | CEO | |
Ned Segal | CFO | |||
2 | Square | 2009 | Jack Dorsey | CEO |
To understand how this query works, notice that CROSS JOIN
and UNNEST
return the row with the flat value of each item in the array.
For example, consider this query
SELECT c.*, executive FROM companies c CROSS JOIN UNNEST(c.executives) as executive
This returns a row for each combination of company and executive.
Row | name | founding_year | executives.name | executives.title | executive.name | executive.title |
---|---|---|---|---|---|---|
1 | Apple | 1976 | Tim Cook | CEO | Tim Cook | CEO |
Jony Ive | Chief Design Officer | |||||
Jeff Williams | COO | |||||
2 | Apple | 1976 | Tim Cook | CEO | Jony Ive | Chief Design Officer |
Jony Ive | Chief Design Officer | |||||
Jeff Williams | COO | |||||
3 | Apple | 1976 | Tim Cook | CEO | Jeff Williams | COO |
Jony Ive | Chief Design Officer | |||||
Jeff Williams | COO | |||||
4 | Amazon | 1994 | Jeff Bezos | CEO | Jeff Bezos | CEO |
Brian T. Olsavsky | CFO | |||||
5 | Amazon | 1994 | Jeff Bezos | CEO | Brian T. Olsavsky | CFO |
Brian T. Olsavsky | CFO | |||||
6 | 2006 | Jack Dorsey | CEO | Jack Dorsey | CEO | |
Ned Segal | CFO | |||||
7 | 2006 | Jack Dorsey | CEO | Ned Segal | SFO | |
Ned Segal | CFO | |||||
8 | AirBNB | 2008 | Brian Chesky | CEO | Brian Chesky | CEO |
Joe Gebbia | CPO | |||||
9 | AirBNB | 2008 | Brian Chesky | CEO | Joe Gebbia | CPO |
Joe Gebbia | CPO | |||||
10 | Square | 2009 | Jack Dorsey | CEO | Jack Dorsey | CEO |
Once, we have this table, it’s easy to see how WHERE
clauses work like we expect on the executive.name
and executive.title
fields.
Given the results above - what would this query return?
SELECT c.name, executive.name as ceo FROM companies c CROSS JOIN UNNEST(c.executives) as executive WHERE executive.title='CEO'
If you guessed right, it returns the name and CEO of each company in our table.
Row | name | ceo |
---|---|---|
1 | Apple | Tim Cook |
2 | Amazon | Jeff Bezos |
3 | Jack Dorsey | |
4 | AirBNB | Brian Chesky |
5 | Square | Jack Dorsey |
Tip: You can use a comma in place of CROSS JOIN
in your queries for brevity.
SELECT c.* FROM companies c CROSS JOIN UNNEST(executives) as executive WHERE executive.name = 'Jack Dorsey';
is equivalent to
SELECT c.* FROM companies c, UNNEST(executives) as executive WHERE executive.name = 'Jack Dorsey';
I hope this helps you explore datasets in BigQuery more easily.
BigQuery has worked well for us as a data warehouse. From engineers to data analysts and managers, I’ve seen users of all technical abilities have use BigQuery at Cruise in their day-to-day work to build a self-driving robotaxi.