在使用pandas的loc函数时,遇到了多个索引的问题。

huangapple go评论89阅读模式
英文:

Having trouble using loc with multiple indexes in pandas

问题

我很感谢你的帮助。所以我有这两个数据框。请注意,df2有两个索引:

import pandas as pd


data1 = {
    "col 1": ['a', 'b', 'c'],
    "col 2": ['x', 'y', 'z']
}

index1 = ['a', 'b', 'c']
index2 = ['x', 'y', 'z']

data2 = {
  "col 1": [420, 380, 390],
  "col 2": [50, 40, 45]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])

df1:

  col 1 col 2
0     a     x
1     b     y
2     c     z

df2:

     col 1  col 2
a x    420     50
b y    380     40
c z    390     45

现在我想进行一个调用,其中df1有一个第三列,该列获取df2的第一个(最左边)索引,而不将df2的任何索引转换为列。我在网上阅读了一些资料,发现最多的是df.index()工具,其参数是要使用的索引。但是这对我不起作用。这是我的调用:

df2['col 3'] = df1.loc[df1['col 1'] == df2.index(0)]

出现了一个错误,让我感到困惑。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-2f901269f295> in <module>
----> 1 df2['col 3'] = df1.loc[df1['col 1'] == df2.index(0)]

TypeError: 'MultiIndex' object is not callable

我该怎么办才能解决这个问题?谢谢!

英文:

I appreciate your help. So I have these two dfs. Notice how df2 has two indexes:

import pandas as pd


data1 = {
    &quot;col 1&quot;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;],
    &quot;col 2&quot;: [&#39;x&#39;, &#39;y&#39;, &#39;z&#39;]
}

index1 = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]
index2 = [&#39;x&#39;, &#39;y&#39;, &#39;z&#39;]

data2 = {
  &quot;col 1&quot;: [420, 380, 390],
  &quot;col 2&quot;: [50, 40, 45]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])

df1:

  col 1 col 2
0     a     x
1     b     y
2     c     z

df2:

     col 1  col 2
a x    420     50
b y    380     40
c z    390     45

Now I'm trying to make a call where df1 has a third column that takes the first (left-most) index from df2 to the column without turning any of df2's indexes into a column. I've been reading online and the most I could find was the df.index() tool, with the argument being which index specifically used. It's not working for me. Here's my call:

df2[&#39;col 3&#39;] = df1.loc[df1[&#39;col 1&#39;] == df2.index(0)]

There's an error involved, which has left me stumped.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
&lt;ipython-input-25-2f901269f295&gt; in &lt;module&gt;
----&gt; 1 df2[&#39;col 3&#39;] = df1.loc[df1[&#39;col 1&#39;] == df2.index(0)]

TypeError: &#39;MultiIndex&#39; object is not callable

What can I do to fix this? Thanks!

答案1

得分: 1

错误消息表明您不能像调用函数一样使用括号来调用pandas DataFrame的索引属性。相反,您需要使用方括号来访问多级索引的特定级别。

在您的情况下,您可以使用df2.index.levels[0]来访问多级索引的第一级别。然后,您可以使用get_level_values方法来获取df2中每行的多级索引的第一级别的值。最后,您可以将结果数组分配给df1的一个新列。

以下是更新后的代码,应该可以实现您想要的功能:

import pandas as pd

data1 = {
    "col 1": ['a', 'b', 'c'],
    "col 2": ['x', 'y', 'z']
}

index1 = ['a', 'b', 'c']
index2 = ['x', 'y', 'z']

data2 = {
  "col 1": [420, 380, 390],
  "col 2": [50, 40, 45]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])

df1['col 3'] = df2.index.get_level_values(0)

输出:

  col 1 col 2 col 3
0     a     x     a
1     b     y     b
2     c     z     c

如您所见,df1添加了一个名为col 3的新列,其中包含来自df2的多级索引的第一级别的值。

英文:

The error message is indicating that you cannot call the index attribute of a pandas DataFrame using parentheses like a function. Instead, you need to use square brackets to access a specific level of the multi-index.

In your case, you can access the first level of the multi-index using df2.index.levels[0]. Then, you can use the get_level_values method to get the values of the first level of the multi-index for each row in df2. Finally, you can assign the resulting array to a new column of df1.

Here's an updated version of your code that should do what you're looking for:

import pandas as pd

data1 = {
    &quot;col 1&quot;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;],
    &quot;col 2&quot;: [&#39;x&#39;, &#39;y&#39;, &#39;z&#39;]
}

index1 = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]
index2 = [&#39;x&#39;, &#39;y&#39;, &#39;z&#39;]

data2 = {
  &quot;col 1&quot;: [420, 380, 390],
  &quot;col 2&quot;: [50, 40, 45]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2, index=[index1, index2])

df1[&#39;col 3&#39;] = df2.index.get_level_values(0)

Output:

  col 1 col 2 col 3
0     a     x     a
1     b     y     b
2     c     z     c

As you can see, a new column named col 3 has been added to df1, with the values from the first level of the multi-index from df2.

huangapple
  • 本文由 发表于 2023年8月9日 03:49:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862803.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定