英文:
pandas columns division returns multiple columns
问题
我试图简单地对两列进行逐元素的除法,但由于某种原因,这返回了两列而不是我所期望的一列。
我认为这与我需要迭代创建数据框有关,因此我选择了逐行追加行的方法。以下是一些测试代码:
import pandas as pd
df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])
# 创建一个DataFrame
data = {
'dataset': ['177.png', '276.png', '208.png', '282.png'],
'partition': ['green', 'green', 'green', 'green'],
'zeros': [1896715, 1914720, 1913894, 1910815],
'ones': [23285, 5280, 6106, 9185],
'total': [1920000, 1920000, 1920000, 1920000]
}
for i in range(len(data['ones'])):
row = []
for k in data.keys():
row.append(data[k][i])
df = df.append(pd.Series(row, index=df.columns), ignore_index=True)
df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]
df["result"] = df["zeros"] / df["total"]
df
如果你尝试运行这个代码,你会发现df_check
的所有工作都正常,而当它到达df["result"] = df["zeros"] / df["total"]
时,代码会失败:
ValueError: Cannot set a DataFrame with multiple columns to the single column result
事实上,如果我尝试检查除法的结果,我会注意到有两列,所有的值都是缺失的:
>>> df["zeros"] / df["total"]
total zeros
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
有什么建议可以解释为什么会发生这种情况以及如何修复它吗?
(Note: I have only translated the requested code-related content and questions.)
英文:
I am trying to simply divide two columns element-wise, but for some reason this returns two columns instead of one as I would expect.
I think it has something to do with the fact that I need to create the dataframe iteratively, so I opted for by appending rows one at a time. Here's some testing code:
import pandas as pd
df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])
# Create a DataFrame
data = {
'dataset': ['177.png', '276.png', '208.png', '282.png'],
'partition': ['green', 'green', 'green', 'green'],
'zeros': [1896715, 1914720, 1913894, 1910815],
'ones': [23285, 5280, 6106, 9185],
'total': [1920000, 1920000, 1920000, 1920000]
}
for i in range(len(data['ones'])):
row = []
for k in data.keys():
row.append(data[k][i])
df = df.append(pd.Series(row, index=df.columns), ignore_index=True)
df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]
df["result"] = df["zeros"] / df["total"]
df
If you try to run this, you'll see that all work as expected with df_check
and the code fails when it get to df["result"] = df["zeros"] / df["total"]
:
ValueError: Cannot set a DataFrame with multiple columns to the single column result
In fact, If I try to inspect the result of the division I notice there are two columns with all missing values:
>>> df["zeros"] / df["total"]
total zeros
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
Any suggestion why this happens and how to fix it?
答案1
得分: 2
Your code should be updated as follows:
import pandas as pd
# Set up the dataframe correctly
data = [
{'image_name': '177.png', 'partition': 'green', 'zeros': 1896715, 'ones': 23285, 'total': 1920000},
{'image_name': '276.png', 'partition': 'green', 'zeros': 1914720, 'ones': 5280, 'total': 1920000},
{'image_name': '208.png', 'partition': 'green', 'zeros': 1913894, 'ones': 6106, 'total': 1920000},
{'image_name': '282.png', 'partition': 'green', 'zeros': 1910815, 'ones': 9185, 'total': 1920000}
]
df = pd.DataFrame(data).rename(columns={'dataset': 'image_name'})
df["result"] = df["zeros"] / df["total"]
This will create the desired dataframe and calculate the result correctly.
Regarding the MultiIndex issue, you should ensure that the column names are set correctly in your dataframe creation step to avoid this problem.
英文:
You logic to set up the dataframe is incorrect, don't use a loop, directly go for the DataFrame
constructor, optionally with an extra step to rename
the columns:
df = pd.DataFrame(data).rename(columns={'dataset': 'image_name'})
df["result"] = df["zeros"] / df["total"]
Output:
image_name partition zeros ones total result
0 177.png green 1896715 23285 1920000 0.987872
1 276.png green 1914720 5280 1920000 0.997250
2 208.png green 1913894 6106 1920000 0.996820
3 282.png green 1910815 9185 1920000 0.995216
With your current approach you end up with a MultiIndex with a single level, which causes the further issue (slicing df['zeros']
and df["total"]
gives you two DataFrames, not Series, and the division is not aligned).
print(df.columns)
MultiIndex([('image_name',),
( 'partition',),
( 'zeros',),
( 'ones',),
( 'total',)],
)
In any case append
is now deprecated.
答案2
得分: 1
问题是以下这行代码:
df = pd.DataFrame(columns=['image_name partition zeros ones total'])
split()
方法本身会创建一个列表,所以避免使用列表,改用以下方式:
df = pd.DataFrame(columns='image_name partition zeros ones total'.split())
英文:
The problem is the following line
df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])
the split()
method create a list itself, so avoid the list and use the following
df = pd.DataFrame(columns='image_name partition zeros ones total'.split())
答案3
得分: 0
我实际上是通过@mozway的建议解决了这个问题。
事实上,问题出在有问题的版本具有MultiIndex上。然而,这是由于我如何指定列列表而不是由于append方法本身。它通过将以下内容从:
df = pd.DataFrame(columns=['image_name partition zeros ones total'])
更改为:
df = pd.DataFrame(columns=["image_name", "partition", "zeros", "ones", "total"])
或者甚至只是columns='image_name partition zeros ones total'.split()
。
英文:
I actually solved the issue thanks to the suggestion in @mozway answer.
Indeed the problem is in the fact that the bugged version has a MultiIndex. However, this is due to how I specify columns list and not due to the append method per-se. It solved changing from
df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])
to
df = pd.DataFrame(columns=["image_name", "partition", "zeros", "ones", "total"])
or even just columns='image_name partition zeros ones total'.split()
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论