pandas列的除法操作会返回多列。

huangapple go评论72阅读模式
英文:

pandas columns division returns multiple columns

问题

我试图简单地对两列进行逐元素的除法,但由于某种原因,这返回了两列而不是我所期望的一列。

我认为这与我需要迭代创建数据框有关,因此我选择了逐行追加行的方法。以下是一些测试代码:

import pandas as pd

df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

# 创建一个DataFrame
data = {
    'dataset': ['177.png', '276.png', '208.png', '282.png'],
    'partition': ['green', 'green', 'green', 'green'],
    'zeros': [1896715, 1914720, 1913894, 1910815],
    'ones': [23285, 5280, 6106, 9185],
    'total': [1920000, 1920000, 1920000, 1920000]
}

for i in range(len(data['ones'])):
    row = []
    for k in data.keys():
        row.append(data[k][i])
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)

df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]

df["result"] = df["zeros"] / df["total"]
df

如果你尝试运行这个代码,你会发现df_check的所有工作都正常,而当它到达df["result"] = df["zeros"] / df["total"]时,代码会失败:

ValueError: Cannot set a DataFrame with multiple columns to the single column result

事实上,如果我尝试检查除法的结果,我会注意到有两列,所有的值都是缺失的:

>>> df["zeros"] / df["total"]

	total	zeros
0	NaN	NaN
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN

有什么建议可以解释为什么会发生这种情况以及如何修复它吗?

(Note: I have only translated the requested code-related content and questions.)

英文:

I am trying to simply divide two columns element-wise, but for some reason this returns two columns instead of one as I would expect.

I think it has something to do with the fact that I need to create the dataframe iteratively, so I opted for by appending rows one at a time. Here's some testing code:

import pandas as pd


df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

# Create a DataFrame
data = {
    'dataset': ['177.png', '276.png', '208.png', '282.png'],
    'partition': ['green', 'green', 'green', 'green'],
    'zeros': [1896715, 1914720, 1913894, 1910815],
    'ones': [23285, 5280, 6106, 9185],
    'total': [1920000, 1920000, 1920000, 1920000]
}

for i in range(len(data['ones'])):
    row = []
    for k in data.keys():
        row.append(data[k][i])
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)

df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]

df["result"] = df["zeros"] / df["total"]
df

If you try to run this, you'll see that all work as expected with df_check and the code fails when it get to df["result"] = df["zeros"] / df["total"]:

ValueError: Cannot set a DataFrame with multiple columns to the single column result

In fact, If I try to inspect the result of the division I notice there are two columns with all missing values:

>>> df["zeros"] / df["total"]

	total	zeros
0	NaN	NaN
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN

Any suggestion why this happens and how to fix it?

答案1

得分: 2

Your code should be updated as follows:

import pandas as pd

# Set up the dataframe correctly
data = [
    {'image_name': '177.png', 'partition': 'green', 'zeros': 1896715, 'ones': 23285, 'total': 1920000},
    {'image_name': '276.png', 'partition': 'green', 'zeros': 1914720, 'ones': 5280, 'total': 1920000},
    {'image_name': '208.png', 'partition': 'green', 'zeros': 1913894, 'ones': 6106, 'total': 1920000},
    {'image_name': '282.png', 'partition': 'green', 'zeros': 1910815, 'ones': 9185, 'total': 1920000}
]

df = pd.DataFrame(data).rename(columns={'dataset': 'image_name'})
df["result"] = df["zeros"] / df["total"]

This will create the desired dataframe and calculate the result correctly.

Regarding the MultiIndex issue, you should ensure that the column names are set correctly in your dataframe creation step to avoid this problem.

英文:

You logic to set up the dataframe is incorrect, don't use a loop, directly go for the DataFrame constructor, optionally with an extra step to rename the columns:

df = pd.DataFrame(data).rename(columns={'dataset': 'image_name'})
df["result"] = df["zeros"] / df["total"]

Output:

  image_name partition    zeros   ones    total    result
0    177.png     green  1896715  23285  1920000  0.987872
1    276.png     green  1914720   5280  1920000  0.997250
2    208.png     green  1913894   6106  1920000  0.996820
3    282.png     green  1910815   9185  1920000  0.995216

With your current approach you end up with a MultiIndex with a single level, which causes the further issue (slicing df['zeros'] and df["total"] gives you two DataFrames, not Series, and the division is not aligned).

print(df.columns)

MultiIndex([('image_name',),
            ( 'partition',),
            (     'zeros',),
            (      'ones',),
            (     'total',)],
           )

In any case append is now deprecated.

答案2

得分: 1

问题是以下这行代码:

df = pd.DataFrame(columns=['image_name partition zeros ones total'])

split() 方法本身会创建一个列表,所以避免使用列表,改用以下方式:

df = pd.DataFrame(columns='image_name partition zeros ones total'.split())
英文:

The problem is the following line

df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

the split() method create a list itself, so avoid the list and use the following

df = pd.DataFrame(columns='image_name partition zeros ones total'.split())

答案3

得分: 0

我实际上是通过@mozway的建议解决了这个问题。

事实上,问题出在有问题的版本具有MultiIndex上。然而,这是由于我如何指定列列表而不是由于append方法本身。它通过将以下内容从:

df = pd.DataFrame(columns=['image_name partition zeros ones total'])

更改为:

df = pd.DataFrame(columns=["image_name", "partition", "zeros", "ones", "total"])

或者甚至只是columns='image_name partition zeros ones total'.split()

英文:

I actually solved the issue thanks to the suggestion in @mozway answer.

Indeed the problem is in the fact that the bugged version has a MultiIndex. However, this is due to how I specify columns list and not due to the append method per-se. It solved changing from

df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

to

df = pd.DataFrame(columns=["image_name", "partition", "zeros", "ones", "total"])

or even just columns='image_name partition zeros ones total'.split().

huangapple
  • 本文由 发表于 2023年6月13日 15:41:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76462677.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定