如何融化数据框,使重复的项目成为与索引对应的值

huangapple go评论105阅读模式
英文:

How to melt a dataframe so repeated items become the values that correspond to the index

问题

我有这个数据框:

  1. df = pd.DataFrame({'Status':['CO','AD','AD','AD','OT','CO','OT','AD'],
  2. 'Mutation':['H157Y','R47H','R47H','R67H','R62H','D87N','D39E','D39E']})
  3. print(df)

我想要数据框看起来像这样:

  1. df2 = pd.DataFrame({'Status':['CO','AD','OT'],'H157Y':[1,0,0],'R47H':[0,2,0],'R67H':[0,1,0],
  2. 'R62H':[0,0,1],'D87N':[1,0,0],'D39E':[1,0,1]})
  3. print(df2)

其中突变是列名,它们的值 - 击中的次数 - 对应于状态。

英文:

I have this dataframe:

  1. df = pd.DataFrame({'Status':['CO','AD','AD','AD','OT','CO','OT','AD'],
  2. 'Mutation':['H157Y','R47H','R47H','R67H','R62H','D87N','D39E','D39E']})
  3. print(df)
  4. Status Mutation
  5. 0 CO H157Y
  6. 1 AD R47H
  7. 2 AD R47H
  8. 3 AD R67H
  9. 4 OT R62H
  10. 5 CO D87N
  11. 6 OT D39E
  12. 7 AD D39E

I want the dataframe to look like this:

  1. df2 = pd.DataFrame({'Status':['CO','AD','OT'],'H157Y':[1,0,0],'R47H':[0,2,0],'R67H':[0,1,0],
  2. 'R62H':[0,0,1],'D87N':[1,0,0],'D39E':[1,0,1]})
  3. print(df2)
  4. Status H157Y R47H R67H R62H D87N D39E
  5. 0 CO 1 0 0 0 1 1
  6. 1 AD 0 2 1 0 0 0
  7. 2 OT 0 0 0 1 0 1

Where mutations are the column names and their values - the number of hits - corresponds to the status.

答案1

得分: 3

这应该可以解决问题:

  1. df.groupby(['Status', 'Mutation']).size().unstack(fill_value=0)
英文:

This should do the trick:

  1. df.groupby(['Status', 'Mutation']).size().unstack(fill_value=0)

答案2

得分: 2

我们可以像下面这样使用 pd.crosstab

  1. >>> pd.crosstab(df["Status"], df["Mutation"])
  2. Mutation D39E D87N H157Y R47H R62H R67H
  3. Status
  4. AD 1 0 0 2 0 1
  5. CO 0 1 1 0 0 0
  6. OT 1 0 0 0 1 0

或者我们可以像下面这样使用 pd.get_dummiespandas.DataFrame.groupby 然后使用 pandas.DataFrame.rename 对列进行重命名:

  1. (pd.get_dummies(df,
  2. columns=['Mutation']
  3. ).groupby(['Status']).sum().rename(columns=lambda x: x.split('_')[1]))

输出结果:

  1. D39E D87N H157Y R47H R62H R67H
  2. Status
  3. AD 1 0 0 2 0 1
  4. CO 0 1 1 0 0 0
  5. OT 1 0 0 0 1 0
英文:

We can use pd.crosstab like the below:

  1. >>> pd.crosstab(df["Status"], df["Mutation"])
  2. Mutation D39E D87N H157Y R47H R62H R67H
  3. Status
  4. AD 1 0 0 2 0 1
  5. CO 0 1 1 0 0 0
  6. OT 1 0 0 0 1 0

Or we can use pd.get_dummies, pandas.DataFrame.groupby then pandas.DataFrame.rename columns like the below:

  1. (pd.get_dummies(df,
  2. columns=['Mutation']
  3. ).groupby(['Status']).sum().rename(columns=lambda x: x.split('_')[1]))

Output:

  1. D39E D87N H157Y R47H R62H R67H
  2. Status
  3. AD 1 0 0 2 0 1
  4. CO 0 1 1 0 0 0
  5. OT 1 0 0 0 1 0

huangapple
  • 本文由 发表于 2023年2月9日 03:23:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75390783.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定