2023年7月31日 21:29:35go评论97阅读模式

英文:

Pandas, mark a cell in dataframe, when there exists the same number in the frame

问题

我有2个pandas数据框。其中一个数据框具有特定数字的列标签。第二个数据框在其中散布着这些列标题值。它们之间有一个匹配的列。我需要在第一个数据框中标记出现在第二个数据框中的那些数字的单元格。

我尝试过创建函数，并使用以下示例：

DF1:
SN      12  13  14  15  16
1121
1122
1123
1143
1156
DF2:
SN      ID1  ID2 ID3 ID4 ID5
1121    12   15  14   16 NAN 
1122    12   16  14  NAN NAN 
1123
1143
1156   NAN   NAN  NAN  14  NAN
Result (hopefully):
DF:
SN      12  13  14  15  16
1121     x           x    x
1122
1123
1143
1156

希望这对你有帮助。

英文:

I have 2 pandas dataframes. One of these dataframes has column labels that are specific numbers. The second dataframe has those column header values scattered throughout. They have a matching column between the both of them. I need to mark the cell in the first dataframe with a string "x" where those numbers appear in the second dataframe.

I've tried making functions, and using the following, here is an example:

DF1: 
SN      12  13  14  15  16
1121
1122
1123
1143
1156
DF2:
SN      ID1  ID2 ID3 ID4 ID5
1121    12   15  14   16 NAN 
1122    12   16  14  NAN NAN 
1123
1143
1156   NAN   NAN  NAN  14  NAN
Result (hopefully):
DF:
SN      12  13  14  15  16
1121     x           x    x
1122
1123
1143
1156

答案1

得分: 1

以下是翻译好的部分：

"看起来你正在尝试根据与DF1的列名匹配且在DF2的相应行中找到的值，标记DF1中的单元格为'x'。"

import pandas as pd
import numpy as np
# 示例数据
data1 = {'SN': [1121, 1122, 1123, 1143, 1156], 12: [None]*5, 13: [None]*5, 14: [None]*5, 15: [None]*5, 16: [None]*5}
data2 = {'SN': [1121, 1122, 1123, 1143, 1156], 'ID1': [12, 12, np.nan, np.nan, np.nan], 'ID2': [15, 16, np.nan, np.nan, np.nan], 'ID3': [14, 14, np.nan, np.nan, np.nan], 'ID4': [16, np.nan, np.nan, np.nan, 14], 'ID5': [np.nan, np.nan, np.nan, np.nan, np.nan]}
DF1 = pd.DataFrame(data1)
DF2 = pd.DataFrame(data2)
# 将'SN'设置为两个数据帧的索引
DF1.set_index('SN', inplace=True)
DF2.set_index('SN', inplace=True)
# 遍历DF1的列（不包括SN）并标记DF2中相应的值
for col in DF1.columns:
    condition = DF2.isin([col]).any(axis=1)
    DF1.loc[condition, col] = 'x'
# 用空字符串填充NaN
DF1.fillna("", inplace=True)
print(DF1)

英文:

It seems like you're trying to mark the cells in DF1 with "x" based on the values that match the column names of DF1 and are found in the corresponding rows of DF2.

import pandas as pd
import numpy as np
# Sample Data
data1 = {&#39;SN&#39;: [1121, 1122, 1123, 1143, 1156], 12: [None]*5, 13: [None]*5, 14: [None]*5, 15: [None]*5, 16: [None]*5}
data2 = {&#39;SN&#39;: [1121, 1122, 1123, 1143, 1156], &#39;ID1&#39;: [12, 12, np.nan, np.nan, np.nan], &#39;ID2&#39;: [15, 16, np.nan, np.nan, np.nan], &#39;ID3&#39;: [14, 14, np.nan, np.nan, np.nan], &#39;ID4&#39;: [16, np.nan, np.nan, np.nan, 14], &#39;ID5&#39;: [np.nan, np.nan, np.nan, np.nan, np.nan]}
DF1 = pd.DataFrame(data1)
DF2 = pd.DataFrame(data2)
# Set &#39;SN&#39; as index for both dataframes
DF1.set_index(&#39;SN&#39;, inplace=True)
DF2.set_index(&#39;SN&#39;, inplace=True)
# Loop through the columns of DF1 (excluding SN) and mark corresponding values in DF2
for col in DF1.columns:
    condition = DF2.isin([col]).any(axis=1)
    DF1.loc[condition, col] = &#39;x&#39;
# Fill NaN with empty strings
DF1.fillna(&quot;&quot;, inplace=True)
print(DF1)

答案2

得分: 1

以下是翻译好的代码部分：

df1.fillna(1).mul(df1.columns).T.isin(df2.stack().groupby(level=0).agg(list).to_dict()).T.applymap({True: 'X'}.get)

输出：

        12    13    14    15    16
SN                                
1121     X  None     X     X     X
1122     X  None     X  None     X
1123  None  None  None  None  None
1143  None  None  None  None  None
1156  None  None     X  None  None

英文:

Here is an option:

df1.fillna(1).mul(df1.columns).T.isin(df2.stack().groupby(level=0).agg(list).to_dict()).T.applymap({True:&#39;X&#39;}.get)

Output:

        12    13    14    15    16
SN                                
1121     X  None     X     X     X
1122     X  None     X  None     X
1123  None  None  None  None  None
1143  None  None  None  None  None
1156  None  None     X  None  None

答案3

得分: 0

s = DF2.stack()
out = DF1.mask(pd.crosstab(s.index.get_level_values(0), s).ge(1)
                 .reindex_like(DF1).fillna(False), 'X').fillna('')

英文:

Assuming SN the index of both DataFrames and the same dtype for the values and the column headers:

s = DF2.stack()
out = DF1.mask(pd.crosstab(s.index.get_level_values(0), s).ge(1)
                 .reindex_like(DF1).fillna(False), &#39;X&#39;).fillna(&#39;&#39;)

Output:

     12 13 14 15 16
SN                 
1121  X     X  X  X
1122  X     X     X
1123               
1143               
1156        X

Reproducible inputs:

DF1 = pd.DataFrame.from_dict({&#39;index&#39;: [1121, 1122, 1123, 1143, 1156],
                              &#39;columns&#39;: [12, 13, 14, 15, 16],
                              &#39;data&#39;: [[None, None, None, None, None],
                                       [None, None, None, None, None],
                                       [None, None, None, None, None],
                                       [None, None, None, None, None],
                                       [None, None, None, None, None]],
                              &#39;index_names&#39;: [&#39;SN&#39;],
                              &#39;column_names&#39;: [None]}, orient=&#39;tight&#39;)
nan = float(&#39;nan&#39;)
DF2 = pd.DataFrame.from_dict({&#39;index&#39;: [1121, 1122, 1123, 1143, 1156],
                              &#39;columns&#39;: [&#39;ID1&#39;, &#39;ID2&#39;, &#39;ID3&#39;, &#39;ID4&#39;, &#39;ID5&#39;],
                              &#39;data&#39;: [[12.0, 15.0, 14.0, 16.0, nan],
                                       [12.0, 16.0, 14.0, nan, nan],
                                       [nan, nan, nan, nan, nan],
                                       [nan, nan, nan, nan, nan],
                                       [nan, nan, nan, 14.0, nan]],
                              &#39;index_names&#39;: [&#39;SN&#39;],
                              &#39;column_names&#39;: [None]}, orient=&#39;tight&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas，在数据框中标记单元格，当数据框中存在相同的数字时。

问题

答案1

答案2

答案3

如何在%%cython中指定-march=native

这个HTTP请求是否有效？

ValueError: 使用assign时无法在具有重复标签的轴上重新索引

检查列是否具有相同的字符串

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。