2023年3月9日 22:35:10go评论103阅读模式

英文:

Python Pandas Merging two columns but drop duplicated value

问题

我在Python Pandas中遇到了以下示例数据框的问题。

列A	列B
a	b
c	c
d	e
g	g

我希望有类似这样的东西

列A	列B	列C
a	b	ab
c	c	c
d	e	de
g	g	g

有人可以帮忙吗？非常感谢。

英文:

I have encountered a problem with following example dataframe in Python Pandas.

Column A	Column B
a	b
c	c
d	e
g	g

I would love to have something like this

Column A	Column B	Column C
a	b	ab
c	c	c
d	e	de
g	g	g

Could someone please help? Much appreciated.

答案1

得分: 2

使用自定义聚合函数与 agg，使用 dict.fromkeys 去除重复值并保持顺序，以及 str.join 连接：

df['Column C'] = df.agg(lambda r: ''.join(dict.fromkeys(r)), axis=1)
# 或仅限于特定列：
cols = ['Column A', 'Column B']
df['Column C'] = df[cols].agg(lambda r: ''.join(dict.fromkeys(r)), axis=1)

或者，如果只有两列：

df['Column C'] = (df['Column A'].add(df['Column B'])
                  .where(df['Column A'].ne(df['Column B']), df['Column A'])
                  )

输出：

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

（注意：代码部分未被翻译）

英文:

Use a custom aggregation with agg, using dict.from_keys to remove the duplicates while keeping order, and str.join to concatenate:

df[&#39;Column C&#39;] = df.agg(lambda r: &#39;&#39;.join(dict.fromkeys(r)), axis=1)
# or limiting to specific columns:
cols = [&#39;Column A&#39;, &#39;Column B&#39;]
df[&#39;Column C&#39;] = df[cols].agg(lambda r: &#39;&#39;.join(dict.fromkeys(r)), axis=1)

Or, if only two columns:

df[&#39;Column C&#39;] = (df[&#39;Column A&#39;].add(df[&#39;Column B&#39;])
                  .where(df[&#39;Column A&#39;].ne(df[&#39;Column B&#39;]), df[&#39;Column A&#39;])
                  )

Output:

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

答案2

得分: 1

你可以使用 apply 并设置 axis=1 来迭代遍历 Pandas 每一行，然后使用 pandas.Series.unique 和 ''.join 来获得结果。

df[&#39;Column C&#39;] = df[[&#39;Column A&#39;, &#39;Column B&#39;]].apply(lambda x: &#39;&#39;.join(x.unique()), axis=1)

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

英文:

You can use apply with axis=1 to iterate on each row of pandas then use pandas.Series.unique and ''.join to get the result.

df[&#39;Column C&#39;] = df[[&#39;Column A&#39;, &#39;Column B&#39;]].apply(lambda x: &#39;&#39;.join(x.unique()), axis=1)

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Pandas合并两列但丢弃重复值

问题

答案1

答案2

Polars – 从S3读取Parquet只读取第一个文件

为什么程序返回一个空响应

Altair具有可变宽度的柱状图？

在Selenium中迭代遍历无序列表并输出价格数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。