英文:
Having trouble using loc with multiple indexes in pandas
问题
我很感谢你的帮助。所以我有这两个数据框。请注意,df2有两个索引:
import pandas as pd
data1 = {
"col 1": ['a', 'b', 'c'],
"col 2": ['x', 'y', 'z']
}
index1 = ['a', 'b', 'c']
index2 = ['x', 'y', 'z']
data2 = {
"col 1": [420, 380, 390],
"col 2": [50, 40, 45]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])
df1:
col 1 col 2
0 a x
1 b y
2 c z
df2:
col 1 col 2
a x 420 50
b y 380 40
c z 390 45
现在我想进行一个调用,其中df1有一个第三列,该列获取df2的第一个(最左边)索引,而不将df2的任何索引转换为列。我在网上阅读了一些资料,发现最多的是df.index()工具,其参数是要使用的索引。但是这对我不起作用。这是我的调用:
df2['col 3'] = df1.loc[df1['col 1'] == df2.index(0)]
出现了一个错误,让我感到困惑。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-2f901269f295> in <module>
----> 1 df2['col 3'] = df1.loc[df1['col 1'] == df2.index(0)]
TypeError: 'MultiIndex' object is not callable
我该怎么办才能解决这个问题?谢谢!
英文:
I appreciate your help. So I have these two dfs. Notice how df2 has two indexes:
import pandas as pd
data1 = {
"col 1": ['a', 'b', 'c'],
"col 2": ['x', 'y', 'z']
}
index1 = ['a', 'b', 'c']
index2 = ['x', 'y', 'z']
data2 = {
"col 1": [420, 380, 390],
"col 2": [50, 40, 45]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])
df1:
col 1 col 2
0 a x
1 b y
2 c z
df2:
col 1 col 2
a x 420 50
b y 380 40
c z 390 45
Now I'm trying to make a call where df1 has a third column that takes the first (left-most) index from df2 to the column without turning any of df2's indexes into a column. I've been reading online and the most I could find was the df.index() tool, with the argument being which index specifically used. It's not working for me. Here's my call:
df2['col 3'] = df1.loc[df1['col 1'] == df2.index(0)]
There's an error involved, which has left me stumped.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-2f901269f295> in <module>
----> 1 df2['col 3'] = df1.loc[df1['col 1'] == df2.index(0)]
TypeError: 'MultiIndex' object is not callable
What can I do to fix this? Thanks!
答案1
得分: 1
错误消息表明您不能像调用函数一样使用括号来调用pandas DataFrame的索引属性。相反,您需要使用方括号来访问多级索引的特定级别。
在您的情况下,您可以使用df2.index.levels[0]
来访问多级索引的第一级别。然后,您可以使用get_level_values方法来获取df2中每行的多级索引的第一级别的值。最后,您可以将结果数组分配给df1的一个新列。
以下是更新后的代码,应该可以实现您想要的功能:
import pandas as pd
data1 = {
"col 1": ['a', 'b', 'c'],
"col 2": ['x', 'y', 'z']
}
index1 = ['a', 'b', 'c']
index2 = ['x', 'y', 'z']
data2 = {
"col 1": [420, 380, 390],
"col 2": [50, 40, 45]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])
df1['col 3'] = df2.index.get_level_values(0)
输出:
col 1 col 2 col 3
0 a x a
1 b y b
2 c z c
如您所见,df1添加了一个名为col 3的新列,其中包含来自df2的多级索引的第一级别的值。
英文:
The error message is indicating that you cannot call the index attribute of a pandas DataFrame using parentheses like a function. Instead, you need to use square brackets to access a specific level of the multi-index.
In your case, you can access the first level of the multi-index using df2.index.levels[0]
. Then, you can use the get_level_values method to get the values of the first level of the multi-index for each row in df2. Finally, you can assign the resulting array to a new column of df1.
Here's an updated version of your code that should do what you're looking for:
import pandas as pd
data1 = {
"col 1": ['a', 'b', 'c'],
"col 2": ['x', 'y', 'z']
}
index1 = ['a', 'b', 'c']
index2 = ['x', 'y', 'z']
data2 = {
"col 1": [420, 380, 390],
"col 2": [50, 40, 45]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])
df1['col 3'] = df2.index.get_level_values(0)
Output:
col 1 col 2 col 3
0 a x a
1 b y b
2 c z c
As you can see, a new column named col 3 has been added to df1, with the values from the first level of the multi-index from df2.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论