2023年6月13日 15:41:19go评论100阅读模式

英文:

pandas columns division returns multiple columns

问题

我试图简单地对两列进行逐元素的除法，但由于某种原因，这返回了两列而不是我所期望的一列。

我认为这与我需要迭代创建数据框有关，因此我选择了逐行追加行的方法。以下是一些测试代码：

import pandas as pd
df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])
# 创建一个DataFrame
data = {
    'dataset': ['177.png', '276.png', '208.png', '282.png'],
    'partition': ['green', 'green', 'green', 'green'],
    'zeros': [1896715, 1914720, 1913894, 1910815],
    'ones': [23285, 5280, 6106, 9185],
    'total': [1920000, 1920000, 1920000, 1920000]
}
for i in range(len(data['ones'])):
    row = []
    for k in data.keys():
        row.append(data[k][i])
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)
df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]
df["result"] = df["zeros"] / df["total"]
df

如果你尝试运行这个代码，你会发现df_check的所有工作都正常，而当它到达df["result"] = df["zeros"] / df["total"]时，代码会失败：

ValueError: Cannot set a DataFrame with multiple columns to the single column result

事实上，如果我尝试检查除法的结果，我会注意到有两列，所有的值都是缺失的：

>>> df["zeros"] / df["total"]
	total	zeros
0	NaN	NaN
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN

有什么建议可以解释为什么会发生这种情况以及如何修复它吗？

(Note: I have only translated the requested code-related content and questions.)

英文:

I am trying to simply divide two columns element-wise, but for some reason this returns two columns instead of one as I would expect.

I think it has something to do with the fact that I need to create the dataframe iteratively, so I opted for by appending rows one at a time. Here's some testing code:

import pandas as pd
df = pd.DataFrame(columns=[&#39;image_name partition zeros ones total&#39;.split()])
# Create a DataFrame
data = {
    &#39;dataset&#39;: [&#39;177.png&#39;, &#39;276.png&#39;, &#39;208.png&#39;, &#39;282.png&#39;],
    &#39;partition&#39;: [&#39;green&#39;, &#39;green&#39;, &#39;green&#39;, &#39;green&#39;],
    &#39;zeros&#39;: [1896715, 1914720, 1913894, 1910815],
    &#39;ones&#39;: [23285, 5280, 6106, 9185],
    &#39;total&#39;: [1920000, 1920000, 1920000, 1920000]
}
for i in range(len(data[&#39;ones&#39;])):
    row = []
    for k in data.keys():
        row.append(data[k][i])
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)
df_check = pd.DataFrame(data)
df_check[&quot;result&quot;] = df_check[&quot;zeros&quot;] / df_check[&quot;total&quot;]
df[&quot;result&quot;] = df[&quot;zeros&quot;] / df[&quot;total&quot;]
df

If you try to run this, you'll see that all work as expected with df_check and the code fails when it get to df["result"] = df["zeros"] / df["total"]:

ValueError: Cannot set a DataFrame with multiple columns to the single column result

In fact, If I try to inspect the result of the division I notice there are two columns with all missing values:

&gt;&gt;&gt; df[&quot;zeros&quot;] / df[&quot;total&quot;]
	total	zeros
0	NaN	NaN
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN

Any suggestion why this happens and how to fix it?

答案1

得分: 2

Your code should be updated as follows:

import pandas as pd
# Set up the dataframe correctly
data = [
    {'image_name': '177.png', 'partition': 'green', 'zeros': 1896715, 'ones': 23285, 'total': 1920000},
    {'image_name': '276.png', 'partition': 'green', 'zeros': 1914720, 'ones': 5280, 'total': 1920000},
    {'image_name': '208.png', 'partition': 'green', 'zeros': 1913894, 'ones': 6106, 'total': 1920000},
    {'image_name': '282.png', 'partition': 'green', 'zeros': 1910815, 'ones': 9185, 'total': 1920000}
]
df = pd.DataFrame(data).rename(columns={'dataset': 'image_name'})
df["result"] = df["zeros"] / df["total"]

This will create the desired dataframe and calculate the result correctly.

Regarding the MultiIndex issue, you should ensure that the column names are set correctly in your dataframe creation step to avoid this problem.

英文:

You logic to set up the dataframe is incorrect, don't use a loop, directly go for the DataFrame constructor, optionally with an extra step to rename the columns:

df = pd.DataFrame(data).rename(columns={&#39;dataset&#39;: &#39;image_name&#39;})
df[&quot;result&quot;] = df[&quot;zeros&quot;] / df[&quot;total&quot;]

Output:

  image_name partition    zeros   ones    total    result
0    177.png     green  1896715  23285  1920000  0.987872
1    276.png     green  1914720   5280  1920000  0.997250
2    208.png     green  1913894   6106  1920000  0.996820
3    282.png     green  1910815   9185  1920000  0.995216

With your current approach you end up with a MultiIndex with a single level, which causes the further issue (slicing df['zeros'] and df["total"] gives you two DataFrames, not Series, and the division is not aligned).

print(df.columns)
MultiIndex([(&#39;image_name&#39;,),
            ( &#39;partition&#39;,),
            (     &#39;zeros&#39;,),
            (      &#39;ones&#39;,),
            (     &#39;total&#39;,)],
           )

In any case append is now deprecated.

答案2

得分: 1

问题是以下这行代码：

df = pd.DataFrame(columns=['image_name partition zeros ones total'])

split() 方法本身会创建一个列表，所以避免使用列表，改用以下方式：

df = pd.DataFrame(columns='image_name partition zeros ones total'.split())

英文:

The problem is the following line

df = pd.DataFrame(columns=[&#39;image_name partition zeros ones total&#39;.split()])

the split() method create a list itself, so avoid the list and use the following

df = pd.DataFrame(columns=&#39;image_name partition zeros ones total&#39;.split())

答案3

得分: 0

我实际上是通过@mozway的建议解决了这个问题。

事实上，问题出在有问题的版本具有MultiIndex上。然而，这是由于我如何指定列列表而不是由于append方法本身。它通过将以下内容从：

df = pd.DataFrame(columns=['image_name partition zeros ones total'])

更改为：

df = pd.DataFrame(columns=["image_name", "partition", "zeros", "ones", "total"])

或者甚至只是columns='image_name partition zeros ones total'.split()。

英文:

I actually solved the issue thanks to the suggestion in @mozway answer.

Indeed the problem is in the fact that the bugged version has a MultiIndex. However, this is due to how I specify columns list and not due to the append method per-se. It solved changing from

df = pd.DataFrame(columns=[&#39;image_name partition zeros ones total&#39;.split()])

df = pd.DataFrame(columns=[&quot;image_name&quot;, &quot;partition&quot;, &quot;zeros&quot;, &quot;ones&quot;, &quot;total&quot;])

or even just columns='image_name partition zeros ones total'.split().

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas列的除法操作会返回多列。

问题

答案1

答案2

答案3

如何使用Confluent Kafka Python包消费Kafka中的最后5分钟数据？

嵌套字典循环与def（缩进问题）

Django在主页上的身份验证

如何在多次添加后删除QLabel。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。