英文:
How to cancat one column values into another column pandas?
问题
我有一个数据框,它有两列,一列叫做'PTM_loc',另一列叫做'PTM_types'。
我想要将这两列连接起来,得到一个新的列,如下所示:
data['PTMs']
0 S24
1 S24;S30
2 T22
3 S19;T20;T66
4 S16;Y30;T50
有人有什么想法吗?
英文:
I have a dataframe, it has two columns, one is called 'PTM_loc' and another one is called 'PTM_types'
data['PTM_loc']
0 24
1 24;30
2 22
3 19;20;66
4 16;30;50
data['PTM_typs']
0 S
1 S
2 T
3 S;T
4 S;Y;T
I would like to concat (or whatever you call this action) these columns together, and get a new column like this:
data['PTMs']
0 S24
1 S24;S30
2 T22
3 S19;T20;T66
4 S16;Y30;T50
Does anyone have any ideas?
答案1
得分: 2
你可以使用双 zip
/zip_longest
进行列表理解:
from itertools import zip_longest
def combine(a, b):
a = a.split(';')
b = b.split(';')
return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))
data['PTMs'] = [combine(a, b) for a, b in zip(data['PTM_types'], data['PTM_loc'])]
# 或者
# from itertools import starmap
# data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))
或者,如果你想纯粹使用 pandas(仅供娱乐,可能效率较低),可以使用 ffill
填充缺失的组合:
# 以矩形形式
df['PTMs'] = (data['PTM_types']
.str.split(';', expand=True).ffill(axis=1)
+ data['PTM_loc'].str.split(';', expand=True)
).stack().groupby(level=0).agg(';')
# 或者以长格式
l = data['PTM_loc'].str.extractall('([^;]+)')
t = data['PTM_types'].str.extractall('([^;]+)')
data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
.groupby(level=0).agg(';')
)
输出:
PTM_loc PTM_types PTMs
0 24 S S24
1 24;30 S S24;S30
2 22 S S22
3 19;20;66 S;T S19;T20;T66
4 16;30;50 S;Y;T S16;Y30;T50
英文:
You can use a list comprehension with a double zip
/zip_longest
:
from itertools import zip_longest
def combine(a, b):
a = a.split(';')
b = b.split(';')
return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))
data['PTMs')] = [combine(a,b) for a,b in zip(data['PTM_types'], data['PTM_loc'])]
# or
# from itertools import starmap
# data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))
Altentatively, for a pure pandas variant (just for fun, it's likely less efficient), use ffill
to fill the missing combinations:
# either in rectangular form
df['PTMs'] = (data['PTM_types']
.str.split(';', expand=True).ffill(axis=1)
+data['PTM_loc'].str.split(';', expand=True)
).stack().groupby(level=0).agg(';'.join)
# or in long form
l = data['PTM_loc'].str.extractall('([^;]+)')
t = data['PTM_typs'].str.extractall('([^;]+)')
data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
.groupby(level=0).agg(';'.join)
)
Output:
PTM_loc PTM_types PTMs
0 24 S S24
1 24;30 S S24;S30
2 22 S S22
3 19;20;66 S;T S19;T20;T66
4 16;30;50 S;Y;T S16;Y30;T50
答案2
得分: 1
以下是翻译好的代码部分:
类似于 `split`
s1 = data['PTM_loc'].str.split(';', expand=True)
s2 = data['PTM_typs'].str.split(';', expand=True)
data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
Out[20]:
0 S24
1 S24;S30
2 T22
3 S19;T20;T66
4 S16;Y30;T50
dtype: object
英文:
Something like split
s1 = data['PTM_loc'].str.split(';',expand=True)
s2 = data['PTM_typs'].str.split(';',expand=True)
data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
Out[20]:
0 S24
1 S24;S30
2 T22
3 S19;T20;T66
4 S16;Y30;T50
dtype: object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论