英文:
Common words in two different pandas data frame and colum
问题
x | disc | tall short | short long | small long | medium |
---|---|---|---|---|---|
a | 'tall', 'short', 'medium' | 1 | 0 | 0 | 1 |
b | 'small', 'long', 'short' | 0 | 1 | 1 | 0 |
英文:
A
x | disc |
---|---|
a | 'tall', 'short', 'medium' |
b | 'small', 'long', 'short' |
B
y |
---|
'tall', 'short' |
'short', 'long' |
'small', 'tall' |
output like-
x | disc | tall short | short long |
---|---|---|---|
a | 'tall', 'short', 'medium' | 1 | 0 |
b | 'small', 'long', 'short' | 0 | 1 |
答案1
得分: 1
Convert values to sets and find common words with set new columns:
将值转换为集合并查找共同的单词,创建新列:
for x in B['y']:
s = set(x.split(', '))
A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]
If necessary, remove only 0
columns:
如果需要,仅移除 0
列:
out = A.loc[:, A.ne(0).any()]
英文:
Convert values to sets and find common words with set new columns:
for x in B['y']:
s = set(x.split(', '))
A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]
If necessarry remove only 0
columns add:
out = A.loc[:, A.ne(0).any()]
答案2
得分: 1
以下是翻译好的内容:
你可以使用NumPy的广播功能进行集合比较:
out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
>= B['y'].apply(set).to_numpy()).astype(int),
columns=B['y'].apply(' '.join), index=A.index)
)
输出:
x disc 高 矮 矮 高 小 高
0 a [高, 矮, 中等] 1 0 0
1 b [小, 长, 矮] 0 1 0
如果你只想要匹配的部分:
tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
>= B['y'].apply(set).to_numpy()),
columns=B['y'].apply(' '.join), index=A.index)
out = A.join(tmp.loc[:, tmp.any()].astype(int))
输出:
x disc 高 矮 矮 高
0 a [高, 矮, 中等] 1 0
1 b [小, 长, 矮] 0 1
英文:
You can use set comparison with numpy broadcasting:
out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
>= B['y'].apply(set).to_numpy()).astype(int),
columns=B['y'].apply(' '.join), index=A.index)
)
Output:
x disc tall short short long small tall
0 a [tall, short, medium] 1 0 0
1 b [small, long, short] 0 1 0
If you want only the matches:
tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
>= B['y'].apply(set).to_numpy()),
columns=B['y'].apply(' '.join), index=A.index)
out = A.join(tmp.loc[:, tmp.any()].astype(int))
Output:
x disc tall short short long
0 a [tall, short, medium] 1 0
1 b [small, long, short] 0 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论