2023年6月26日 20:45:23go评论111阅读模式

英文:

How to cancat one column values into another column pandas?

问题

我有一个数据框，它有两列，一列叫做'PTM_loc'，另一列叫做'PTM_types'。

我想要将这两列连接起来，得到一个新的列，如下所示：

data['PTMs']
0        S24
1        S24;S30
2        T22
3        S19;T20;T66
4        S16;Y30;T50

有人有什么想法吗？

英文:

I have a dataframe, it has two columns, one is called 'PTM_loc' and another one is called 'PTM_types'

data[&#39;PTM_loc&#39;]
0        24
1        24;30
2        22
3        19;20;66
4        16;30;50
data[&#39;PTM_typs&#39;]
0        S
1        S
2        T
3        S;T
4        S;Y;T

I would like to concat (or whatever you call this action) these columns together, and get a new column like this:

data[&#39;PTMs&#39;]
0        S24
1        S24;S30
2        T22
3        S19;T20;T66
4        S16;Y30;T50

Does anyone have any ideas?

答案1

得分: 2

你可以使用双 zip/zip_longest 进行列表理解：

from itertools import zip_longest
def combine(a, b):
    a = a.split(';')
    b = b.split(';')
    return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))
data['PTMs'] = [combine(a, b) for a, b in zip(data['PTM_types'], data['PTM_loc'])]
# 或者
# from itertools import starmap
# data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))

或者，如果你想纯粹使用 pandas（仅供娱乐，可能效率较低），可以使用 ffill 填充缺失的组合：

# 以矩形形式
df['PTMs'] = (data['PTM_types']
 .str.split(';', expand=True).ffill(axis=1)
 + data['PTM_loc'].str.split(';', expand=True)
).stack().groupby(level=0).agg(';')
# 或者以长格式
l = data['PTM_loc'].str.extractall('([^;]+)')
t = data['PTM_types'].str.extractall('([^;]+)')
data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
                 .groupby(level=0).agg(';')
                )

输出：

    PTM_loc PTM_types         PTMs
0        24         S          S24
1     24;30         S      S24;S30
2        22         S          S22
3  19;20;66       S;T  S19;T20;T66
4  16;30;50     S;Y;T  S16;Y30;T50

英文:

You can use a list comprehension with a double zip/zip_longest:

from itertools import zip_longest
def combine(a, b):
    a = a.split(&#39;;&#39;)
    b = b.split(&#39;;&#39;)
    return &#39;;&#39;.join(map(&#39;&#39;.join, zip_longest(a, b, fillvalue=a[-1])))
data[&#39;PTMs&#39;)] = [combine(a,b) for a,b in zip(data[&#39;PTM_types&#39;], data[&#39;PTM_loc&#39;])]
# or 
# from itertools import starmap
# data[&#39;PTMs&#39;] = list(starmap(combine, zip(data[&#39;PTM_types&#39;], data[&#39;PTM_loc&#39;])))

Altentatively, for a pure pandas variant (just for fun, it's likely less efficient), use ffill to fill the missing combinations:

# either in rectangular form
df[&#39;PTMs&#39;] = (data[&#39;PTM_types&#39;]
 .str.split(&#39;;&#39;, expand=True).ffill(axis=1)
 +data[&#39;PTM_loc&#39;].str.split(&#39;;&#39;, expand=True)
).stack().groupby(level=0).agg(&#39;;&#39;.join)
# or in long form
l = data[&#39;PTM_loc&#39;].str.extractall(&#39;([^;]+)&#39;)
t = data[&#39;PTM_typs&#39;].str.extractall(&#39;([^;]+)&#39;)
data[&#39;PTMs&#39;] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
                 .groupby(level=0).agg(&#39;;&#39;.join)
                )

Output:

    PTM_loc PTM_types         PTMs
0        24         S          S24
1     24;30         S      S24;S30
2        22         S          S22
3  19;20;66       S;T  S19;T20;T66
4  16;30;50     S;Y;T  S16;Y30;T50

答案2

得分: 1

以下是翻译好的代码部分：

类似于 `split` 
s1 = data['PTM_loc'].str.split(';', expand=True)
s2 = data['PTM_typs'].str.split(';', expand=True)
data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
Out[20]: 
0            S24
1        S24;S30
2            T22
3    S19;T20;T66
4    S16;Y30;T50
dtype: object

英文:

Something like split

s1 = data[&#39;PTM_loc&#39;].str.split(&#39;;&#39;,expand=True)
s2 = data[&#39;PTM_typs&#39;].str.split(&#39;;&#39;,expand=True)
data[&#39;new&#39;] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(&#39;;&#39;.join)
Out[20]: 
0            S24
1        S24;S30
2            T22
3    S19;T20;T66
4    S16;Y30;T50
dtype: object

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将一个列的值连接到另一个列 pandas 中如何实现？

问题

答案1

答案2

Python属性错误 ‘Map’对象没有属性’cells’。

Azure ML experiment run failing with 'HttpLoggingPolicy' has no attribute 'DEFAULT_HEADERS_ALLOWLIST'

将元素循环添加到列表中

如何根据日期更改df的结构

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。