2023年6月29日 11:07:30go评论118阅读模式

英文:

using groupby on DataFrame in Pandas

问题

import pandas as pd
df = pd.read_csv(r"C:\Users\Ouis AL-Hetar\Documents\TestEmployeeTable1.csv")
sal = df.groupby("Department")["Salary"].sum().reset_index()
sal.columns = ["Department", "Sum_of_salary"]
print(sal)

英文:

import pandas as pd
df = pd.read_csv(r&quot;C:\Users\Ouis AL-Hetar\Documents\TestEmployeeTable1.csv&quot;)
sal= df.groupby(&quot;Department&quot;).sum(&quot;Salary&quot;).reset_index()
sal.columns=[&quot;Dapartment&quot;,&quot;Sum_of_salary&quot;]
print(sal)

when i treid run this code ir raise an Error:
enter image description here
enter image description here

i have tried print head() for checking if there is any errors in the names of columns :
enter image description here
but i note any error

i hope someone who knows what's the problem help me ,
sorry for my discusting English

答案1

得分: 1

CSV文件的默认分隔符是“，”。在您的情况下，分隔符似乎是分号而不是逗号，因此您需要将sep=";"作为pd.read_csv的参数来正确读取您的文件：

#                                        在这里 --v
df = pd.read_csv("TestEmployeeTable1.csv", sep=";")

但是，您需要修改您的其余代码：

sal = df.groupby("Department", as_index=False)["Salary"].sum()
sal.columns = ["Department", "Sum_of_salary"]
# 或者
sal = (df.groupby("Department", as_index=False)
         .agg(Sum_of_salary=("Salary", "sum")))

英文:

The default separator of CSV file is ,. In your case, it seems the separator is a semicolon and not a comma so you need to set sep=";" as parameter of pd.read_csv to correctly read your file:

#                                        HERE --v
df = pd.read_csv(&quot;TestEmployeeTable1.csv&quot;, sep=&quot;;&quot;)

However, you have to modify the rest of your code:

sal = df.groupby(&quot;Department&quot;, as_index=False)[&quot;Salary&quot;].sum()
sal.columns = [&quot;Department&quot;, &quot;Sum_of_salary&quot;]
# OR
sal = (df.groupby(&quot;Department&quot;, as_index=False)
         .agg(Sum_of_salary=(&quot;Salary&quot;, &quot;sum&quot;)))

答案2

得分: 0

pandas.DataFrame.groupby()方法与一般的DataFrame方法略有不同，因为groupby方法不会直接返回一个DataFrame或Series，这意味着它允许我们在抽象意义上将DataFrame拆分为组，但实际上并没有进行任何计算，直到在Groupby对象上调用函数。

另外要记住，groupby函数遵循（拆分-应用-合并）的过程：拆分DataFrame-应用函数-合并结果。
另外，通过groupby调用返回Groupby对象。
我认为，与其直接使用head()函数，不如使用：DataFrameGroupBy.head([n])：返回每个组的前n行。

英文:

pandas.DataFrame.groupby()method is little different from general dataframe methods , As groupby method doesnot give a DataFrame or Series in return directly meaning it allows us to split the dataframe into groups but only in an abstract sense.Nothing really get computed until a function is called on Groupby object.

Also remember a groupby function follows (split-apply-combine): Split the dataframe-apply the function-combine the result.
Also Groupby objects are returned by groupby calls
I think rather than using head() function directly
Use: DataFrameGroupBy.head([n]):Return first n rows of each group.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Pandas中的DataFrame进行groupby操作。

问题

答案1

答案2

将列级别连接到现有的多列 pandas 数据帧。

Pandas groupby(pd.Grouper) is throwing error for datetime but im running it on a datetime object

“ValueError: cannot set a frame with no defined columns” 在追加两个数据框时出现。

SQL按年份统计数量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。