两个不同的Pandas数据框和列中的常用词

huangapple go评论64阅读模式
英文:

Common words in two different pandas data frame and colum

问题

x disc tall short short long small long medium
a 'tall', 'short', 'medium' 1 0 0 1
b 'small', 'long', 'short' 0 1 1 0
英文:

A

x disc
a 'tall', 'short', 'medium'
b 'small', 'long', 'short'

B

y
'tall', 'short'
'short', 'long'
'small', 'tall'

output like-

x disc tall short short long
a 'tall', 'short', 'medium' 1 0
b 'small', 'long', 'short' 0 1

答案1

得分: 1

Convert values to sets and find common words with set new columns:

将值转换为集合并查找共同的单词,创建新列:

for x in B['y']:
    s = set(x.split(', '))
    A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]

If necessary, remove only 0 columns:

如果需要,仅移除 0 列:

out = A.loc[:, A.ne(0).any()]
英文:

Convert values to sets and find common words with set new columns:

for x in B['y']:
    s = set(x.split(', '))
    A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]

If necessarry remove only 0 columns add:

out = A.loc[:, A.ne(0).any()]

答案2

得分: 1

以下是翻译好的内容:

你可以使用NumPy的广播功能进行集合比较:

out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                           >= B['y'].apply(set).to_numpy()).astype(int),
                          columns=B['y'].apply(' '.join), index=A.index)
             )

输出:

   x                   disc  高  矮  矮 高  小 高
0  a  [高, 矮, 中等]           1           0           0
1  b   [小, 长, 矮]           0           1           0

如果你只想要匹配的部分:

tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                     >= B['y'].apply(set).to_numpy()),
                    columns=B['y'].apply(' '.join), index=A.index)
                   

out = A.join(tmp.loc[:, tmp.any()].astype(int))

输出:

   x                   disc  高  矮  矮 高
0  a  [高, 矮, 中等]           1           0
1  b   [小, 长, 矮]           0           1
英文:

You can use set comparison with numpy broadcasting:

out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                           >= B['y'].apply(set).to_numpy()).astype(int),
                          columns=B['y'].apply(' '.join), index=A.index)
             )

Output:

   x                   disc  tall short  short long  small tall
0  a  [tall, short, medium]           1           0           0
1  b   [small, long, short]           0           1           0

If you want only the matches:

tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                     >= B['y'].apply(set).to_numpy()),
                    columns=B['y'].apply(' '.join), index=A.index)
                   
out = A.join(tmp.loc[:, tmp.any()].astype(int))

Output:

   x                   disc  tall short  short long
0  a  [tall, short, medium]           1           0
1  b   [small, long, short]           0           1

huangapple
  • 本文由 发表于 2023年5月17日 15:44:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269656.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定