Python Pandas合并两列但丢弃重复值

huangapple go评论73阅读模式
英文:

Python Pandas Merging two columns but drop duplicated value

问题

我在Python Pandas中遇到了以下示例数据框的问题。

列A 列B
a b
c c
d e
g g

我希望有类似这样的东西

列A 列B 列C
a b ab
c c c
d e de
g g g

有人可以帮忙吗?非常感谢。

英文:

I have encountered a problem with following example dataframe in Python Pandas.

Column A Column B
a b
c c
d e
g g

I would love to have something like this

Column A Column B Column C
a b ab
c c c
d e de
g g g

Could someone please help? Much appreciated.

答案1

得分: 2

使用自定义聚合函数与 agg,使用 dict.fromkeys 去除重复值并保持顺序,以及 str.join 连接:

df['Column C'] = df.agg(lambda r: ''.join(dict.fromkeys(r)), axis=1)

# 或仅限于特定列:
cols = ['Column A', 'Column B']
df['Column C'] = df[cols].agg(lambda r: ''.join(dict.fromkeys(r)), axis=1)

或者,如果只有两列:

df['Column C'] = (df['Column A'].add(df['Column B'])
                  .where(df['Column A'].ne(df['Column B']), df['Column A'])
                  )

输出:

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

(注意:代码部分未被翻译)

英文:

Use a custom aggregation with agg, using dict.from_keys to remove the duplicates while keeping order, and str.join to concatenate:

df['Column C'] = df.agg(lambda r: ''.join(dict.fromkeys(r)), axis=1)

# or limiting to specific columns:
cols = ['Column A', 'Column B']
df['Column C'] = df[cols].agg(lambda r: ''.join(dict.fromkeys(r)), axis=1)

Or, if only two columns:

df['Column C'] = (df['Column A'].add(df['Column B'])
                  .where(df['Column A'].ne(df['Column B']), df['Column A'])
                  )

Output:

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

答案2

得分: 1

你可以使用 apply 并设置 axis=1 来迭代遍历 Pandas 每一行,然后使用 pandas.Series.unique''.join 来获得结果。

df['Column C'] = df[['Column A', 'Column B']].apply(lambda x: ''.join(x.unique()), axis=1)

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g
英文:

You can use apply with axis=1 to iterate on each row of pandas then use pandas.Series.unique and ''.join to get the result.

df['Column C'] = df[['Column A', 'Column B']].apply(lambda x: ''.join(x.unique()), axis=1)

  Column A Column B Column C
0        a        b       ab
1        c        c        c
2        d        e       de
3        g        g        g

huangapple
  • 本文由 发表于 2023年3月9日 22:35:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75686012.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定