英文:
Why does GROUP BY give me too high a row count in BigQuery
问题
I believe I am missing something (probably quite simple) in the use of GROUP BY in BigQuery, and I am hoping someone can set me straight.
比较这两个查询,我得到不同的用户数量
SELECT SUM(users) FROM (
SELECT
DATE,
COUNT(DISTINCT user_id) AS users,
FROM
`mytable`
WHERE
DATE BETWEEN ('2022-05-01') AND ('2022-05-31')
GROUP BY
DATE
)
users 的值约为:140,000
SELECT
COUNT(DISTINCT user_id) AS users,
FROM
`mytable`
WHERE
DATE BETWEEN ('2022-05-01') AND ('2022-05-31')
users 的值约为:120,000
<details>
<summary>英文:</summary>
I believe I am missing something (probably quite simple) in the use of **GROUP BY** in BigQuery, and I am hoping someone can set me straight.
Comparing these two queries I get different numbers of users
SELECT SUM(users) FROM (
SELECT
DATE,
COUNT(DISTINCT user_id) AS users,
FROM
`mytable`
WHERE
DATE BETWEEN ('2022-05-01') AND ('2022-05-31')
GROUP BY
DATE
)
value for users approx: 140000
SELECT
COUNT(DISTINCT user_id) AS users,
FROM
`mytable`
WHERE
DATE BETWEEN ('2022-05-01') AND ('2022-05-31')
value for users approx: 120000
</details>
# 答案1
**得分**: 2
在第二个查询中,您正在计算整个日期范围内不同的 user_id 值的数量。在第一个查询中,您正在计算范围内 *每一天* 的不同 user_id 值,然后将它们相加。在第一个查询中,可能会统计不同日期上重复的用户。
<details>
<summary>英文:</summary>
In the second query you're counting the distinct user_id values in the entire date range. In the first query you're counting the distinct user_id values *for each day* in the range, then summing those. There are probably duplicate users being counted on different days in the first query.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论