英文:
Pandas, mark a cell in dataframe, when there exists the same number in the frame
问题
我有2个pandas数据框。其中一个数据框具有特定数字的列标签。第二个数据框在其中散布着这些列标题值。它们之间有一个匹配的列。我需要在第一个数据框中标记出现在第二个数据框中的那些数字的单元格。
我尝试过创建函数,并使用以下示例:
DF1:
SN 12 13 14 15 16
1121
1122
1123
1143
1156
DF2:
SN ID1 ID2 ID3 ID4 ID5
1121 12 15 14 16 NAN
1122 12 16 14 NAN NAN
1123
1143
1156 NAN NAN NAN 14 NAN
Result (hopefully):
DF:
SN 12 13 14 15 16
1121 x x x
1122
1123
1143
1156
希望这对你有帮助。
英文:
I have 2 pandas dataframes. One of these dataframes has column labels that are specific numbers. The second dataframe has those column header values scattered throughout. They have a matching column between the both of them. I need to mark the cell in the first dataframe with a string "x" where those numbers appear in the second dataframe.
I've tried making functions, and using the following, here is an example:
DF1:
SN 12 13 14 15 16
1121
1122
1123
1143
1156
DF2:
SN ID1 ID2 ID3 ID4 ID5
1121 12 15 14 16 NAN
1122 12 16 14 NAN NAN
1123
1143
1156 NAN NAN NAN 14 NAN
Result (hopefully):
DF:
SN 12 13 14 15 16
1121 x x x
1122
1123
1143
1156
答案1
得分: 1
以下是翻译好的部分:
"看起来你正在尝试根据与DF1的列名匹配且在DF2的相应行中找到的值,标记DF1中的单元格为'x'。"
import pandas as pd
import numpy as np
# 示例数据
data1 = {'SN': [1121, 1122, 1123, 1143, 1156], 12: [None]*5, 13: [None]*5, 14: [None]*5, 15: [None]*5, 16: [None]*5}
data2 = {'SN': [1121, 1122, 1123, 1143, 1156], 'ID1': [12, 12, np.nan, np.nan, np.nan], 'ID2': [15, 16, np.nan, np.nan, np.nan], 'ID3': [14, 14, np.nan, np.nan, np.nan], 'ID4': [16, np.nan, np.nan, np.nan, 14], 'ID5': [np.nan, np.nan, np.nan, np.nan, np.nan]}
DF1 = pd.DataFrame(data1)
DF2 = pd.DataFrame(data2)
# 将'SN'设置为两个数据帧的索引
DF1.set_index('SN', inplace=True)
DF2.set_index('SN', inplace=True)
# 遍历DF1的列(不包括SN)并标记DF2中相应的值
for col in DF1.columns:
condition = DF2.isin([col]).any(axis=1)
DF1.loc[condition, col] = 'x'
# 用空字符串填充NaN
DF1.fillna("", inplace=True)
print(DF1)
英文:
It seems like you're trying to mark the cells in DF1 with "x" based on the values that match the column names of DF1 and are found in the corresponding rows of DF2.
import pandas as pd
import numpy as np
# Sample Data
data1 = {'SN': [1121, 1122, 1123, 1143, 1156], 12: [None]*5, 13: [None]*5, 14: [None]*5, 15: [None]*5, 16: [None]*5}
data2 = {'SN': [1121, 1122, 1123, 1143, 1156], 'ID1': [12, 12, np.nan, np.nan, np.nan], 'ID2': [15, 16, np.nan, np.nan, np.nan], 'ID3': [14, 14, np.nan, np.nan, np.nan], 'ID4': [16, np.nan, np.nan, np.nan, 14], 'ID5': [np.nan, np.nan, np.nan, np.nan, np.nan]}
DF1 = pd.DataFrame(data1)
DF2 = pd.DataFrame(data2)
# Set 'SN' as index for both dataframes
DF1.set_index('SN', inplace=True)
DF2.set_index('SN', inplace=True)
# Loop through the columns of DF1 (excluding SN) and mark corresponding values in DF2
for col in DF1.columns:
condition = DF2.isin([col]).any(axis=1)
DF1.loc[condition, col] = 'x'
# Fill NaN with empty strings
DF1.fillna("", inplace=True)
print(DF1)
答案2
得分: 1
以下是翻译好的代码部分:
df1.fillna(1).mul(df1.columns).T.isin(df2.stack().groupby(level=0).agg(list).to_dict()).T.applymap({True: 'X'}.get)
输出:
12 13 14 15 16
SN
1121 X None X X X
1122 X None X None X
1123 None None None None None
1143 None None None None None
1156 None None X None None
英文:
Here is an option:
df1.fillna(1).mul(df1.columns).T.isin(df2.stack().groupby(level=0).agg(list).to_dict()).T.applymap({True:'X'}.get)
Output:
12 13 14 15 16
SN
1121 X None X X X
1122 X None X None X
1123 None None None None None
1143 None None None None None
1156 None None X None None
答案3
得分: 0
s = DF2.stack()
out = DF1.mask(pd.crosstab(s.index.get_level_values(0), s).ge(1)
.reindex_like(DF1).fillna(False), 'X').fillna('')
英文:
Assuming SN
the index of both DataFrames and the same dtype for the values and the column headers:
s = DF2.stack()
out = DF1.mask(pd.crosstab(s.index.get_level_values(0), s).ge(1)
.reindex_like(DF1).fillna(False), 'X').fillna('')
Output:
12 13 14 15 16
SN
1121 X X X X
1122 X X X
1123
1143
1156 X
Reproducible inputs:
DF1 = pd.DataFrame.from_dict({'index': [1121, 1122, 1123, 1143, 1156],
'columns': [12, 13, 14, 15, 16],
'data': [[None, None, None, None, None],
[None, None, None, None, None],
[None, None, None, None, None],
[None, None, None, None, None],
[None, None, None, None, None]],
'index_names': ['SN'],
'column_names': [None]}, orient='tight')
nan = float('nan')
DF2 = pd.DataFrame.from_dict({'index': [1121, 1122, 1123, 1143, 1156],
'columns': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5'],
'data': [[12.0, 15.0, 14.0, 16.0, nan],
[12.0, 16.0, 14.0, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, 14.0, nan]],
'index_names': ['SN'],
'column_names': [None]}, orient='tight')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论