2023年5月17日 15:44:52go评论82阅读模式

英文:

Common words in two different pandas data frame and colum

问题

x	disc	tall short	short long	small long	medium
a	'tall', 'short', 'medium'	1	0	0	1
b	'small', 'long', 'short'	0	1	1	0

英文:

x	disc
a	'tall', 'short', 'medium'
b	'small', 'long', 'short'

y
'tall', 'short'
'short', 'long'
'small', 'tall'

output like-

x	disc	tall short	short long
a	'tall', 'short', 'medium'	1	0
b	'small', 'long', 'short'	0	1

答案1

得分: 1

Convert values to sets and find common words with set new columns:

将值转换为集合并查找共同的单词，创建新列：

for x in B['y']:
    s = set(x.split(', '))
    A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]

If necessary, remove only 0 columns:

如果需要，仅移除 0 列：

out = A.loc[:, A.ne(0).any()]

英文:

Convert values to sets and find common words with set new columns:

for x in B[&#39;y&#39;]:
    s = set(x.split(&#39;, &#39;))
    A[x] = [int(set(y.split(&#39;, &#39;)) &gt;= s) for y in A[&#39;disc&#39;]]

If necessarry remove only 0 columns add:

out = A.loc[:, A.ne(0).any()]

答案2

得分: 1

以下是翻译好的内容：

你可以使用NumPy的广播功能进行集合比较：

out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                           >= B['y'].apply(set).to_numpy()).astype(int),
                          columns=B['y'].apply(' '.join), index=A.index)
             )

输出：

   x                   disc  高  矮  矮 高  小 高
0  a  [高, 矮, 中等]           1           0           0
1  b   [小, 长, 矮]           0           1           0

如果你只想要匹配的部分：

tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                     >= B['y'].apply(set).to_numpy()),
                    columns=B['y'].apply(' '.join), index=A.index)
                   

out = A.join(tmp.loc[:, tmp.any()].astype(int))

输出：

   x                   disc  高  矮  矮 高
0  a  [高, 矮, 中等]           1           0
1  b   [小, 长, 矮]           0           1

英文:

You can use set comparison with numpy broadcasting:

out = A.join(pd.DataFrame((A[&#39;disc&#39;].apply(set).to_numpy()[:,None]
                           &gt;= B[&#39;y&#39;].apply(set).to_numpy()).astype(int),
                          columns=B[&#39;y&#39;].apply(&#39; &#39;.join), index=A.index)
             )

Output:

   x                   disc  tall short  short long  small tall
0  a  [tall, short, medium]           1           0           0
1  b   [small, long, short]           0           1           0

If you want only the matches:

tmp = pd.DataFrame((A[&#39;disc&#39;].apply(set).to_numpy()[:,None]
                     &gt;= B[&#39;y&#39;].apply(set).to_numpy()),
                    columns=B[&#39;y&#39;].apply(&#39; &#39;.join), index=A.index)
                   
out = A.join(tmp.loc[:, tmp.any()].astype(int))

Output:

   x                   disc  tall short  short long
0  a  [tall, short, medium]           1           0
1  b   [small, long, short]           0           1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

两个不同的Pandas数据框和列中的常用词

问题

答案1

答案2

问题安装来自GitHub的Python程序

如何将Excel单元格的值读取为列表？

数据框最大匹配两列

创建分类之间的层次结构。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论