英文:
Bigquery unnest and pivot column
问题
我有一个在BigQuery中格式如下的非常大的表格:
行 | 列.key | 列.value.string_value | 列.value.int_value | 列.value.float_value | 列.value.boolean_value | id |
---|---|---|---|---|---|---|
1 | key1 | aa | null | null | null | id1 |
key2 | null | null | null | false | ||
key3 | fa | null | null | null | ||
key4 | null | null | null | true | ||
2 | key1 | ab | null | null | null | id1 |
key2 | null | null | null | false | ||
key3 | gf | null | null | null | ||
key4 | null | null | null | false | ||
3 | key1 | af | null | null | null | id2 |
key2 | null | null | null | true | ||
key3 | fa | null | null | null | ||
key4 | null | null | null | false |
我需要将其重新格式化如下:
行 | key1 | key2 | key3 | key4 | id |
---|---|---|---|---|---|
1 | aa | false | fa | true | id1 |
2 | ab | false | gf | false | id1 |
3 | af | true | fa | false | id2 |
我尝试使用以下链接中的方法,但无法在我的表格中使其工作:
https://stackoverflow.com/questions/63989161/how-to-unnest-and-pivot-two-columns-in-bigquery
我的表格中一个重要的问题是,就像上面的表格一样,id在不同行重复出现,但对于相同的键具有不同的值。在id重复的情况下,我仍然希望保留所有id的出现次数,就像在格式化的表格示例中一样。
此外,我在这里只列出了4个键,但实际上有50个,所以尽量减少手动操作是更好的。如果唯一的方法是手动编写每个列,那我仍然会这样做。
有人知道如何做吗?我基本上需要将每个键转换为新的列,并将其与id关联的相应值,但我不知道如何做。
非常感谢!
英文:
I have a very big table in BigQuery formatted as follows:
row | column.key | column.value.string_value | column.value.int_value | column.value.float_value | column.value.boolean_value | id |
---|---|---|---|---|---|---|
1 | key1 | aa | null | null | null | id1 |
key2 | null | null | null | false | ||
key3 | fa | null | null | null | ||
key4 | null | null | null | true | ||
2 | key1 | ab | null | null | null | id1 |
key2 | null | null | null | false | ||
key3 | gf | null | null | null | ||
key4 | null | null | null | false | ||
3 | key1 | af | null | null | null | id2 |
key2 | null | null | null | true | ||
key3 | fa | null | null | null | ||
key4 | null | null | null | false |
I need to re-format it as follows:
row | key1 | key2 | key3 | key4 | id |
---|---|---|---|---|---|
1 | aa | false | fa | true | id1 |
2 | ab | false | gf | false | id1 |
3 | af | true | fa | false | id2 |
I tried to use this but couldn't make it work in my table
https://stackoverflow.com/questions/63989161/how-to-unnest-and-pivot-two-columns-in-bigquery
One important thing in my table is that, like in the above table, the ids repeat in different rows but have different values for the same keys. Where the ids repeat I want to still have all incidences of the ids, like in the formatted table example.
Also I put only 4 keys here but I actually have 50, so the least manual it is, the better. If the only way is writing each column manually then I'll still do it though.
Does anyone know how to do it? I basically need to turn every key into a new column with it's corresponding value associated with the ids, but I don't know how to do it.
Thank you very much!
答案1
得分: 1
要重新格式化它,首先需要将 column.value
中的 struct 合并为一个单一的列值。这可以通过使用 COALESCE
函数轻松完成,因为在 Google Analytics 的 column.value
结构中通常只有一个 非空 值。
COALESCE(
value.string_value,
'' || value.int_value, -- 将 int64 转换为字符串
'' || value.float_value,
'' || value.boolean_value
) value,
之后,您可以使用 PIVOT
查询来简单地重塑它。
SELECT * FROM (
SELECT t.* EXCEPT(column),
key,
COALESCE(
value.string_value,
'' || value.int_value,
'' || value.float_value,
'' || value.boolean_value
) value,
FROM sample_table t, UNNEST(column)
) PIVOT (ANY_VALUE(value) FOR key IN ('key1', 'key2', 'key3', 'key4'));
查询结果
sample_table
WITH sample_table AS (
SELECT 1 row,
'id1' id,
[STRUCT('key1' AS key, STRUCT('aa' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value)] column
UNION ALL
SELECT 2 row, 'id1' id,
[STRUCT('key1' AS key, STRUCT('ab' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('gf' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
UNION ALL
SELECT 3 row, 'id2' id,
[STRUCT('key1' AS key, STRUCT('af' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
)
如果您的表中有许多 键,您可以通过动态查询进一步概括它,参考您问题中的链接或以下链接:
英文:
In order to re-format it, firstly you need to merge column.value
struct into one single column value. This can be easily done using COALESCE
function since usually there is only one non-null value in column.value
struct in Google Analytics.
COALESCE(
value.string_value,
'' || value.int_value, -- to covert int64 to string
'' || value.float_value,
'' || value.boolean_value
) value,
After that, you can simply reshape it using PIVOT
query.
SELECT * FROM (
SELECT t.* EXCEPT(column),
key,
COALESCE(
value.string_value,
'' || value.int_value,
'' || value.float_value,
'' || value.boolean_value
) value,
FROM sample_table t, UNNEST(column)
) PIVOT (ANY_VALUE(value) FOR key IN ('key1', 'key2', 'key3', 'key4'));
Query results
sample_table
WITH sample_table AS (
SELECT 1 row,
'id1' id,
[STRUCT('key1' AS key, STRUCT('aa' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value)] column
UNION ALL
SELECT 2 row, 'id1' id,
[STRUCT('key1' AS key, STRUCT('ab' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('gf' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
UNION ALL
SELECT 3 row, 'id2' id,
[STRUCT('key1' AS key, STRUCT('af' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
)
You can further generalize it with a dynamic query if you have many keys in your table refering to the link in your question or below.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论