为什么在使用COUNT时需要使用GROUP BY函数?

huangapple go评论69阅读模式
英文:

Why does one need to use the GROUP BY function when using COUNT?

问题

以下是您要翻译的内容:

我是SQL的新手。我有以下查询,它有效,如下所示。

我尝试过多次不使用GROUP BY函数,但它不起作用。我对这个函数的作用很好奇,因为我无法理解在线的任何描述,特别是这篇文章:https://www.w3schools.com/sql/sql_groupby.asp。我知道它会将项目表的行收集到组中,但为什么我需要在这个查询中这样做呢?

SELECT
usertype,
CONCAT(start_station_name, "到", end_station_name) AS route,
COUNT(*) as num_trips,
ROUND(AVG(cast(tripduration as int64)/60), 2) AS duration
FROM
bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY
start_station_name, end_station_name, usertype
LIMIT 10

英文:

I am new to SQL. I have the the following query that works, attached below.

I tried it multiple times without the GROUP BY function and it didn't work. I was curious as to what this function does as I can't understand any descriptions of it online, specifically this article: https://www.w3schools.com/sql/sql_groupby.asp. I know it causes the rows of the items table to be collected into groups, but why do I need to do that for this query to work?

SELECT 
    usertype,
    CONCAT(start_station_name," to ", end_station_name) AS route,
    COUNT(*) as num_trips,
    ROUND(AVG(cast(tripduration as int64)/60), 2) AS duration
FROM 
    `bigquery-public-data.new_york_citibike.citibike_trips` 
GROUP BY
    start_station_name, end_station_name, usertype
LIMIT 10

答案1

得分: 1

我不确定如何解释为什么需要使用分组和定义一起。因为听起来你已经浏览了一些关于这些网站的定义。不过,让我给你举个例子:

假设你有一家商店。人们来访问你的商店,购买你每天销售的不同物品,而你的商店会维护每笔交易的账目。

现在想象一下,你正在尝试准备一个本月、本周或当前日期的报告,报告应该包含总销售额,按类别的总销售额,或者按每个产品的总销售额(在该期间销售的任何物品)。

假设你的商店销售数据表如下所示:

    序号 | 日期  | 产品   | 数量 | 价格 | 总金额 | 客户
    -----------------------------------------------
    1    | 10-12 | 糖    | 10  | 20  | 200   | A
    2    | 10-12 | 大米  | 5   | 50  | 250   | B
    3    | 10-12 | 大米  | 2   | 50  | 100   | C
    4    | 10-12 | 土豆  | 15  | 3   | 45    | C
    5    | 10-12 | 可乐  | 2   | 15  | 30    | 现金
    6    | 10-13 | 大米  | 1   | 50  | 50    | 现金

现在来看看分组和聚合函数能为你提供什么。在你的报告中,你想要看到:

  1. 总销售额或交易计数
  2. 每天的总销售额或每天的交易计数
  3. 总现金销售与客户销售
  4. 每种产品的总销售额

所有这些问题都要求我们对我们的数据进行一些汇总。计数、求和、平均值等都是汇总操作。根据我们的汇总操作,我们可能需要对记录进行分组。例如:

select count(*) from ledger

这给我提供了来自账簿的销售总计数,我不需要在这里提供任何分组,因为我没有在我的选择列表中使用任何其他字段,我将根据数据组的组执行汇总。

另一个例子:

select count(*), Product from ledger

如果我不提供分组,那么它会抛出异常,因为现在我们的选择列表中包含了影响我们试图获取的聚合的列。在这个例子中,我们试图按产品获取总销售计数。我们在这个示例中的要求明确表示“按产品”。这意味着我们试图按我们拥有的产品对销售数据进行分组,并打印总销售计数。

select count(*), Product from Ledger
group by Product

当你在选择列表中使用聚合函数以及其他字段时,你必须在分组字段中提供所有非聚合字段。

如果你只选择了聚合字段而没有选择其他字段,那么分组字段是“可选的”。希望这有助于你稍微理解为什么我们需要分组。

建议
我建议你创建一个类似这样的简单表格,并自己思考你想要看到的不同类型的报告。这样你会学到很多。

英文:

I'm not sure how to explain why you need a group by with the definition. Because sounds like you went through some definitions on those sites. However, let me give you an example:

Let's say you have a store. People come visit your store and purchase different items that you sell day to day and your store maintains the ledger of each transaction.

Imagine, now you are trying to prepare a report for this month or week or current day that your report should contain total sales, total sales by say category, or total sales by each product (whatever was sold for the period).

Let's assume your store sales data table is like this below:

Sn| Date| Product| Qty| Rate| TotalAmount | Customer
-----------------------------------------------------
1  10-12   Sugar    10    20    200          A
2  10-12   Rice     5     50    250          B
3  10-12   Rice     2     50    100          C
4  10-12   Potato   15    3     45           C
5  10-12   Cola      2    15    30           Cash
6  10-13   Rice     1     50    50           Cash

Now comes what group by can give you along with aggregate function. In your report you want to see:

1 Total Sales Amount or Transaction Count
2 Total Sales each day Or Transaction Count each day
3 Total Cash Sale vs Customer Sales
4 Total Sale by each Product

All these questions is asking us to run some aggregation on our data. Count, Sum, Average etc are all aggregations. And based on our aggregation we "may" need to group our records. Example:

select count(*) from ledger   

this gives me total count of sale from ledger, I'm not required to provide any group by here becasue i'm not using any of other fields into my selection list that I would perform aggregation based on the group of data. Another example:

select count(*), Product from ledger 

If I do not provide group by then its gonna throw the exception because now our select list has the column that affect the aggregration we are trying to get. In this example we are trying to get the Total Sales Count by Product. Our requirement in this example clearly says "By Product". That means we are trying to group the sales data by product we have and print the total sales Count.

select count(*), Product from Ledger 
group by Product

When you are using the aggregate function along with other fields in your select list then you must supply all the non-aggregate fields in your group by.
If you're not selecting any other fields than aggregate fields then group by fields is "optional". Hope this helps a bit to understand why we need group by.

Suggestion:
I would ask you to create a simple table like this and ask yourself for the different kinds of reports that you want to see. That way you will learn a lot.

答案2

得分: 0

选择 col1、col2、col3 和连接(col1, col2) 从表 table_ 中,其中 col1 = 'some_value'。

这种类型的SQL只涉及连接(col1, col2) 也涉及行。

COUNT(*) 是一种聚合函数,意味着它们是对行的组进行引用,这就是为什么需要使用 GROUP BY 子句。使用 GROUP BY,您可以执行 HAVING -> 这是对行的 WHERE,对于行的组而言就是 HAVING。

尝试创建一个包含上述任何类型的3个字段的表,用少数行填充它们,-> col1 具有所有值,col2 只有2个值,col3 具有1个值,然后尝试对每个列进行计数以查看发生了什么。

英文:
select col1, col2, col3, concatenate(col1, col2)
from table_
where col1 = 'some_value'

This kind of SQL refers just to the rows concatenate(col1, col2) also refers to the rows.

COUNT(*) is one of the aggregate functions that means they are referring to the groups of rows, that's why you need to use the GROUP BY clause. With group by you can perform HAVING -> what is WHERE for the rows, that is HAVING for the groups of rows.

Try to create table with 3 fields of any type as above, populate them with few rows, -> col1 has all values present, col2 has just 2 values, col3 has 1 value, and then try to perform counting of each column to see what happens.

huangapple
  • 本文由 发表于 2023年8月5日 01:19:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838009.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定