pandas的`merge`在多级索引连接时会丢弃级别

huangapple go评论65阅读模式
英文:

pandas' `merge` drops levels on multiindex joins

问题

I don't understand why merge call below dropped the zero level l0 while join didn't? I don't see that behavior described in the docs. Any explanation?

import string
import pandas as pd
alph = string.ascii_lowercase

n=5
inds = pd.MultiIndex.from_tuples([(i,j) for i in alph[:n] for j in range(1,n)])
t = pd.DataFrame(data=np.random.randint(0,10, len(inds)), index=inds).sort_index()
t.index.names=['l0', 'l1']

t2 = pd.DataFrame(data = [222,333], index=[2,3])
t2.index.names = ['l1']
display(t, t2)
display(t.merge(t2,how='left', on='l1'))
display(t.join(t2,how='left', on='l1', lsuffix='_x', rsuffix='_y'))
英文:

I don't understand why merge call below dropped the zero level l0 while join didn't? I don't see that behavior described in the docs. Any explanation?

import string
import pandas as pd
alph = string.ascii_lowercase

n=5
inds = pd.MultiIndex.from_tuples([(i,j) for i in alph[:n] for j in range(1,n)])
t = pd.DataFrame(data=np.random.randint(0,10, len(inds)), index=inds).sort_index()
t.index.names=['l0', 'l1']

t2 = pd.DataFrame(data = [222,333], index=[2,3])
t2.index.names = ['l1']
display(t, t2)
display(t.merge(t2,how='left', on='l1'))
display(t.join(t2,how='left', on='l1', lsuffix='_x', rsuffix='_y'))

答案1

得分: 1

你可以在 merge 调用中使用 set_index() 并给它传递 tindex 来保留你的 l0 索引:

t.merge(t2, how='left', on='l1').set_index(t.index)

我运行代码时打印了上面的内容,得到了以下结果:

       0_x    0_y
l0 l1            
a  1     5    NaN
   2     7  222.0
   3     3  333.0
   4     5    NaN
b  1     4    NaN
   2     2  222.0
   3     0  333.0
   4     8    NaN
c  1     8    NaN
   2     2  222.0
   3     9  333.0
   4     9    NaN
d  1     3    NaN
   2     4  222.0
   3     2  333.0
   4     1    NaN
e  1     7    NaN
   2     1  222.0
   3     2  333.0
   4     0    NaN
英文:

You can use set_index() in your merge call and give it the index of t to preserve your l0 index:

t.merge(t2,how='left', on='l1').set_index(t.index)

Printing the above when I ran the code gave me the following:

       0_x    0_y
l0 l1            
a  1     5    NaN
   2     7  222.0
   3     3  333.0
   4     5    NaN
b  1     4    NaN
   2     2  222.0
   3     0  333.0
   4     8    NaN
c  1     8    NaN
   2     2  222.0
   3     9  333.0
   4     9    NaN
d  1     3    NaN
   2     4  222.0
   3     2  333.0
   4     1    NaN
e  1     7    NaN
   2     1  222.0
   3     2  333.0
   4     0    NaN

huangapple
  • 本文由 发表于 2023年4月17日 09:07:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76031100.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定