Pandas,在数据框中标记单元格,当数据框中存在相同的数字时。

huangapple go评论65阅读模式
英文:

Pandas, mark a cell in dataframe, when there exists the same number in the frame

问题

我有2个pandas数据框。其中一个数据框具有特定数字的列标签。第二个数据框在其中散布着这些列标题值。它们之间有一个匹配的列。我需要在第一个数据框中标记出现在第二个数据框中的那些数字的单元格。

我尝试过创建函数,并使用以下示例:

DF1:

SN      12  13  14  15  16
1121
1122
1123
1143
1156

DF2:
SN      ID1  ID2 ID3 ID4 ID5
1121    12   15  14   16 NAN 
1122    12   16  14  NAN NAN 
1123
1143
1156   NAN   NAN  NAN  14  NAN

Result (hopefully):

DF:
SN      12  13  14  15  16
1121     x           x    x
1122
1123
1143
1156

希望这对你有帮助。

英文:

I have 2 pandas dataframes. One of these dataframes has column labels that are specific numbers. The second dataframe has those column header values scattered throughout. They have a matching column between the both of them. I need to mark the cell in the first dataframe with a string "x" where those numbers appear in the second dataframe.

I've tried making functions, and using the following, here is an example:

DF1: 

SN      12  13  14  15  16
1121
1122
1123
1143
1156

DF2:
SN      ID1  ID2 ID3 ID4 ID5
1121    12   15  14   16 NAN 
1122    12   16  14  NAN NAN 
1123
1143
1156   NAN   NAN  NAN  14  NAN


Result (hopefully):

DF:
SN      12  13  14  15  16
1121     x           x    x
1122
1123
1143
1156

答案1

得分: 1

以下是翻译好的部分:

"看起来你正在尝试根据与DF1的列名匹配且在DF2的相应行中找到的值,标记DF1中的单元格为'x'。"

import pandas as pd
import numpy as np

# 示例数据
data1 = {'SN': [1121, 1122, 1123, 1143, 1156], 12: [None]*5, 13: [None]*5, 14: [None]*5, 15: [None]*5, 16: [None]*5}
data2 = {'SN': [1121, 1122, 1123, 1143, 1156], 'ID1': [12, 12, np.nan, np.nan, np.nan], 'ID2': [15, 16, np.nan, np.nan, np.nan], 'ID3': [14, 14, np.nan, np.nan, np.nan], 'ID4': [16, np.nan, np.nan, np.nan, 14], 'ID5': [np.nan, np.nan, np.nan, np.nan, np.nan]}

DF1 = pd.DataFrame(data1)
DF2 = pd.DataFrame(data2)

# 将'SN'设置为两个数据帧的索引
DF1.set_index('SN', inplace=True)
DF2.set_index('SN', inplace=True)

# 遍历DF1的列(不包括SN)并标记DF2中相应的值
for col in DF1.columns:
    condition = DF2.isin([col]).any(axis=1)
    DF1.loc[condition, col] = 'x'

# 用空字符串填充NaN
DF1.fillna("", inplace=True)

print(DF1)
英文:

It seems like you're trying to mark the cells in DF1 with "x" based on the values that match the column names of DF1 and are found in the corresponding rows of DF2.

import pandas as pd
import numpy as np

# Sample Data
data1 = {'SN': [1121, 1122, 1123, 1143, 1156], 12: [None]*5, 13: [None]*5, 14: [None]*5, 15: [None]*5, 16: [None]*5}
data2 = {'SN': [1121, 1122, 1123, 1143, 1156], 'ID1': [12, 12, np.nan, np.nan, np.nan], 'ID2': [15, 16, np.nan, np.nan, np.nan], 'ID3': [14, 14, np.nan, np.nan, np.nan], 'ID4': [16, np.nan, np.nan, np.nan, 14], 'ID5': [np.nan, np.nan, np.nan, np.nan, np.nan]}

DF1 = pd.DataFrame(data1)
DF2 = pd.DataFrame(data2)

# Set 'SN' as index for both dataframes
DF1.set_index('SN', inplace=True)
DF2.set_index('SN', inplace=True)

# Loop through the columns of DF1 (excluding SN) and mark corresponding values in DF2
for col in DF1.columns:
    condition = DF2.isin([col]).any(axis=1)
    DF1.loc[condition, col] = 'x'

# Fill NaN with empty strings
DF1.fillna("", inplace=True)

print(DF1)

答案2

得分: 1

以下是翻译好的代码部分:

df1.fillna(1).mul(df1.columns).T.isin(df2.stack().groupby(level=0).agg(list).to_dict()).T.applymap({True: 'X'}.get)

输出:

        12    13    14    15    16
SN                                
1121     X  None     X     X     X
1122     X  None     X  None     X
1123  None  None  None  None  None
1143  None  None  None  None  None
1156  None  None     X  None  None
英文:

Here is an option:

df1.fillna(1).mul(df1.columns).T.isin(df2.stack().groupby(level=0).agg(list).to_dict()).T.applymap({True:'X'}.get)

Output:

        12    13    14    15    16
SN                                
1121     X  None     X     X     X
1122     X  None     X  None     X
1123  None  None  None  None  None
1143  None  None  None  None  None
1156  None  None     X  None  None

答案3

得分: 0

s = DF2.stack()
out = DF1.mask(pd.crosstab(s.index.get_level_values(0), s).ge(1)
                 .reindex_like(DF1).fillna(False), 'X').fillna('')
英文:

Assuming SN the index of both DataFrames and the same dtype for the values and the column headers:

s = DF2.stack()
out = DF1.mask(pd.crosstab(s.index.get_level_values(0), s).ge(1)
                 .reindex_like(DF1).fillna(False), 'X').fillna('')

Output:

     12 13 14 15 16
SN                 
1121  X     X  X  X
1122  X     X     X
1123               
1143               
1156        X      

Reproducible inputs:

DF1 = pd.DataFrame.from_dict({'index': [1121, 1122, 1123, 1143, 1156],
                              'columns': [12, 13, 14, 15, 16],
                              'data': [[None, None, None, None, None],
                                       [None, None, None, None, None],
                                       [None, None, None, None, None],
                                       [None, None, None, None, None],
                                       [None, None, None, None, None]],
                              'index_names': ['SN'],
                              'column_names': [None]}, orient='tight')

nan = float('nan')
DF2 = pd.DataFrame.from_dict({'index': [1121, 1122, 1123, 1143, 1156],
                              'columns': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5'],
                              'data': [[12.0, 15.0, 14.0, 16.0, nan],
                                       [12.0, 16.0, 14.0, nan, nan],
                                       [nan, nan, nan, nan, nan],
                                       [nan, nan, nan, nan, nan],
                                       [nan, nan, nan, 14.0, nan]],
                              'index_names': ['SN'],
                              'column_names': [None]}, orient='tight')

huangapple
  • 本文由 发表于 2023年7月31日 21:29:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76804132.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定