将一个列的值连接到另一个列 pandas 中如何实现?

huangapple go评论111阅读模式
英文:

How to cancat one column values into another column pandas?

问题

我有一个数据框,它有两列,一列叫做'PTM_loc',另一列叫做'PTM_types'。

我想要将这两列连接起来,得到一个新的列,如下所示:

  1. data['PTMs']
  2. 0 S24
  3. 1 S24;S30
  4. 2 T22
  5. 3 S19;T20;T66
  6. 4 S16;Y30;T50

有人有什么想法吗?

英文:

I have a dataframe, it has two columns, one is called 'PTM_loc' and another one is called 'PTM_types'

  1. data['PTM_loc']
  2. 0 24
  3. 1 24;30
  4. 2 22
  5. 3 19;20;66
  6. 4 16;30;50
  7. data['PTM_typs']
  8. 0 S
  9. 1 S
  10. 2 T
  11. 3 S;T
  12. 4 S;Y;T

I would like to concat (or whatever you call this action) these columns together, and get a new column like this:

  1. data['PTMs']
  2. 0 S24
  3. 1 S24;S30
  4. 2 T22
  5. 3 S19;T20;T66
  6. 4 S16;Y30;T50

Does anyone have any ideas?

答案1

得分: 2

你可以使用双 zip/zip_longest 进行列表理解:

  1. from itertools import zip_longest
  2. def combine(a, b):
  3. a = a.split(';')
  4. b = b.split(';')
  5. return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))
  6. data['PTMs'] = [combine(a, b) for a, b in zip(data['PTM_types'], data['PTM_loc'])]
  7. # 或者
  8. # from itertools import starmap
  9. # data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))

或者,如果你想纯粹使用 pandas(仅供娱乐,可能效率较低),可以使用 ffill 填充缺失的组合:

  1. # 以矩形形式
  2. df['PTMs'] = (data['PTM_types']
  3. .str.split(';', expand=True).ffill(axis=1)
  4. + data['PTM_loc'].str.split(';', expand=True)
  5. ).stack().groupby(level=0).agg(';')
  6. # 或者以长格式
  7. l = data['PTM_loc'].str.extractall('([^;]+)')
  8. t = data['PTM_types'].str.extractall('([^;]+)')
  9. data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
  10. .groupby(level=0).agg(';')
  11. )

输出:

  1. PTM_loc PTM_types PTMs
  2. 0 24 S S24
  3. 1 24;30 S S24;S30
  4. 2 22 S S22
  5. 3 19;20;66 S;T S19;T20;T66
  6. 4 16;30;50 S;Y;T S16;Y30;T50
英文:

You can use a list comprehension with a double zip/zip_longest:

  1. from itertools import zip_longest
  2. def combine(a, b):
  3. a = a.split(';')
  4. b = b.split(';')
  5. return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))
  6. data['PTMs')] = [combine(a,b) for a,b in zip(data['PTM_types'], data['PTM_loc'])]
  7. # or
  8. # from itertools import starmap
  9. # data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))

Altentatively, for a pure pandas variant (just for fun, it's likely less efficient), use ffill to fill the missing combinations:

  1. # either in rectangular form
  2. df['PTMs'] = (data['PTM_types']
  3. .str.split(';', expand=True).ffill(axis=1)
  4. +data['PTM_loc'].str.split(';', expand=True)
  5. ).stack().groupby(level=0).agg(';'.join)
  6. # or in long form
  7. l = data['PTM_loc'].str.extractall('([^;]+)')
  8. t = data['PTM_typs'].str.extractall('([^;]+)')
  9. data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
  10. .groupby(level=0).agg(';'.join)
  11. )

Output:

  1. PTM_loc PTM_types PTMs
  2. 0 24 S S24
  3. 1 24;30 S S24;S30
  4. 2 22 S S22
  5. 3 19;20;66 S;T S19;T20;T66
  6. 4 16;30;50 S;Y;T S16;Y30;T50

答案2

得分: 1

以下是翻译好的代码部分:

  1. 类似于 `split`
  2. s1 = data['PTM_loc'].str.split(';', expand=True)
  3. s2 = data['PTM_typs'].str.split(';', expand=True)
  4. data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
  5. Out[20]:
  6. 0 S24
  7. 1 S24;S30
  8. 2 T22
  9. 3 S19;T20;T66
  10. 4 S16;Y30;T50
  11. dtype: object
英文:

Something like split

  1. s1 = data['PTM_loc'].str.split(';',expand=True)
  2. s2 = data['PTM_typs'].str.split(';',expand=True)
  3. data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
  4. Out[20]:
  5. 0 S24
  6. 1 S24;S30
  7. 2 T22
  8. 3 S19;T20;T66
  9. 4 S16;Y30;T50
  10. dtype: object

huangapple
  • 本文由 发表于 2023年6月26日 20:45:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76556821.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定