英文:
How to melt a dataframe so repeated items become the values that correspond to the index
问题
我有这个数据框:
df = pd.DataFrame({'Status':['CO','AD','AD','AD','OT','CO','OT','AD'],
'Mutation':['H157Y','R47H','R47H','R67H','R62H','D87N','D39E','D39E']})
print(df)
我想要数据框看起来像这样:
df2 = pd.DataFrame({'Status':['CO','AD','OT'],'H157Y':[1,0,0],'R47H':[0,2,0],'R67H':[0,1,0],
'R62H':[0,0,1],'D87N':[1,0,0],'D39E':[1,0,1]})
print(df2)
其中突变是列名,它们的值 - 击中的次数 - 对应于状态。
英文:
I have this dataframe:
df = pd.DataFrame({'Status':['CO','AD','AD','AD','OT','CO','OT','AD'],
'Mutation':['H157Y','R47H','R47H','R67H','R62H','D87N','D39E','D39E']})
print(df)
Status Mutation
0 CO H157Y
1 AD R47H
2 AD R47H
3 AD R67H
4 OT R62H
5 CO D87N
6 OT D39E
7 AD D39E
I want the dataframe to look like this:
df2 = pd.DataFrame({'Status':['CO','AD','OT'],'H157Y':[1,0,0],'R47H':[0,2,0],'R67H':[0,1,0],
'R62H':[0,0,1],'D87N':[1,0,0],'D39E':[1,0,1]})
print(df2)
Status H157Y R47H R67H R62H D87N D39E
0 CO 1 0 0 0 1 1
1 AD 0 2 1 0 0 0
2 OT 0 0 0 1 0 1
Where mutations are the column names and their values - the number of hits - corresponds to the status.
答案1
得分: 3
这应该可以解决问题:
df.groupby(['Status', 'Mutation']).size().unstack(fill_value=0)
英文:
This should do the trick:
df.groupby(['Status', 'Mutation']).size().unstack(fill_value=0)
答案2
得分: 2
我们可以像下面这样使用 pd.crosstab
:
>>> pd.crosstab(df["Status"], df["Mutation"])
Mutation D39E D87N H157Y R47H R62H R67H
Status
AD 1 0 0 2 0 1
CO 0 1 1 0 0 0
OT 1 0 0 0 1 0
或者我们可以像下面这样使用 pd.get_dummies
、pandas.DataFrame.groupby
然后使用 pandas.DataFrame.rename
对列进行重命名:
(pd.get_dummies(df,
columns=['Mutation']
).groupby(['Status']).sum().rename(columns=lambda x: x.split('_')[1]))
输出结果:
D39E D87N H157Y R47H R62H R67H
Status
AD 1 0 0 2 0 1
CO 0 1 1 0 0 0
OT 1 0 0 0 1 0
英文:
We can use pd.crosstab
like the below:
>>> pd.crosstab(df["Status"], df["Mutation"])
Mutation D39E D87N H157Y R47H R62H R67H
Status
AD 1 0 0 2 0 1
CO 0 1 1 0 0 0
OT 1 0 0 0 1 0
Or we can use pd.get_dummies
, pandas.DataFrame.groupby
then pandas.DataFrame.rename
columns like the below:
(pd.get_dummies(df,
columns=['Mutation']
).groupby(['Status']).sum().rename(columns=lambda x: x.split('_')[1]))
Output:
D39E D87N H157Y R47H R62H R67H
Status
AD 1 0 0 2 0 1
CO 0 1 1 0 0 0
OT 1 0 0 0 1 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论