如何合并两个数据框并进行数据透视。

huangapple go评论72阅读模式
英文:

How to join 2 dataframes and pivot them

问题

如何合并两个数据框并对它们进行透视

我有一个名为dfICU的数据框,其中包含医院中ICU单位的列表

ICU
A1
A2
A3
B1
B2Closed
B2Covid
B7
C1West
C2South
C3
。
。
。
P53Child

另一个数据框dfPts包含患者信息

PtsID  VisitID   ICU        Frequency
934    15        A3         4
934    15        C2South    2
934    62       B2Covid    5
934    62        A2         6
882    35        C2South    7
882    35        C3         2
882    35        A2         9
882    91        P53Child   5
105    44        C2South    2
105    80        B7         8

我试图将它们合并成一个单一的透视数据框,如果dfPts中不存在该ICU单位,则显示0频率

类似于这样

PtsID  VisitID   A1   A2   A3    B1   B2Closed   B2Covid   B7   C1West   C2South   C3   ....  P53Child
934    15        0    0    4     0    0          0         0    0        2         0          0
934    62        0    6    0     0    0          5         0    0        0         0          0
882    35        0    0    0     0    0          0         0    0        7         2          0
882    91        0    0    0     0    0          0         0    0        0         0          5
105    44        0    0    0     0    0          0         0    0        2         0          0
105    80        0    0    0     0    0          0         8    0        0         0          0

我首先对dfPts进行透视,但这并没有添加所有dfICU中的ICU单位,因为某些ICUs对于所有患者都为0

到目前为止,我已经做了以下工作,但之后不知道该怎么做

df = dfPts.set_index(['PtsID','VisitID']).pivot(columns='ICU')['Frequency']
df[np.isnan(df)] = 0

如何实现这一目标?

英文:

how to join 2 dataframes and pivot them

I have this dfICU dataframe that has list of ICU units in a hospital

ICU
A1
A2
A3
B1
B2Closed
B2Covid
B7
C1West
C2South
C3
.
.
.
P53Child

the other dataframe dfPts has Patients info

PtsID  VisitID   ICU        Frequency
934    15        A3         4
934    15        C2South    2
934    62       B2Covid    5
934    62        A2         6
882    35        C2South    7
882    35        C3         2
882    35        A2         9
882    91        P53Child   5
105    44        C2South    2
105    80        B7         8

I am trying to put them both in a single pivoted dataframe so if the ICU unit does not exit in the dfPts it shows 0 Frequency

Something like this

PtsID  VisitID   A1   A2   A3    B1   B2Closed   B2Covid   B7   C1West   C2South   C3   ....  P53Child
934    15        0    0    4     0    0          0         0    0        2         0          0
934    62        0    6    0     0    0          5         0    0        0         0          0
882    35        0    0    0     0    0          0         0    0        7         2          0
882    91        0    0    0     0    0          0         0    0        0         0          5
105    44        0    0    0     0    0          0         0    0        2         0          0
105    80        0    0    0     0    0          0         8    0        0         0          0

I start by pivoting the dfPts but that did not add all ICU units in dfICU because some ICUs are 0 for all patients

here is what i have done so far and did not know what to do after

df = dfPts.set_index(['PtsID','VisitID']).pivot(columns='ICU')['Frequency']
df[np.isnan(df)] = 0

How to do that?

答案1

得分: 2

将数据框进行透视,然后重新索引以确保所有的ICU都出现在列标题中,然后用0填充缺失的ICU的值。

结果

ICU  PtsID  VisitID   A1   A2   A3   B1  B2Closed  B2Covid   B7  C1West  C2South   C3  P53Child
0      105       44  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0      2.0  0.0       0.0
1      105       80  0.0  0.0  0.0  0.0       0.0      0.0  8.0     0.0      0.0  0.0       0.0
2      882       35  0.0  9.0  0.0  0.0       0.0      0.0  0.0     0.0      7.0  2.0       0.0
3      882       91  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0      0.0  0.0       5.0
4      934       15  0.0  0.0  4.0  0.0       0.0      0.0  0.0     0.0      2.0  0.0       0.0
5      934       62  0.0  6.0  0.0  0.0       0.0      5.0  0.0     0.0      0.0  0.0       0.0
英文:

Pivot the dataframe then reindex to ensure all the icus are present in the column headers, then fill the values in missing icus with 0

icus = dfICU['ICU'].unique()
(
    dfPts
    .pivot(index=['PtsID', 'VisitID'], columns='ICU', values='Frequency')
    .reindex(columns=icus)
    .fillna(0).reset_index()
)

Result

ICU  PtsID  VisitID   A1   A2   A3   B1  B2Closed  B2Covid   B7  C1West  C2South   C3  P53Child
0      105       44  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0      2.0  0.0       0.0
1      105       80  0.0  0.0  0.0  0.0       0.0      0.0  8.0     0.0      0.0  0.0       0.0
2      882       35  0.0  9.0  0.0  0.0       0.0      0.0  0.0     0.0      7.0  2.0       0.0
3      882       91  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0      0.0  0.0       5.0
4      934       15  0.0  0.0  4.0  0.0       0.0      0.0  0.0     0.0      2.0  0.0       0.0
5      934       62  0.0  6.0  0.0  0.0       0.0      5.0  0.0     0.0      0.0  0.0       0.0

答案2

得分: 1

由于dfICU只包含已经存在于dfPtsICU列中的值的列表,因此您只需要使用dfPts来创建您想要的数据透视表。

在创建数据透视表之前,请确保Frequency是数值类型:

# 获取列的信息
dfPts.info()

然后创建数据透视表:

# 创建数据透视表
pivot_table = pd.pivot_table(dfPts, index=['PtsID', 'VisitID'], 
                             columns='ICU', values='Frequency',
                             aggfunc='sum')  # 您可以更改聚合函数
                             #.reset_index()  # 如果您想要索引作为列,请取消注释

# 检查数据透视表
pivot_table.head(10)

如果发现dfICU中的一些ICU在数据透视表的列中不存在,您可以像这样添加它们并将值设置为0:

# 创建缺失列的列表
ICU_list = list(dfICU.ICU.unique())
missing_cols = [col for col in ICU_list 
                if col not in pivot_table.columns]

# 将缺失的列添加到数据透视表并设置值为0
for col in missing_cols:
    pivot_table[col] = 0

# 检查数据透视表
pivot_table.head()
英文:

Since dfICU only contains a list of values that are already present in the column ICU in dfPts, you only need to use dfPts for the pivot table you want.

Before creating the pivot table, make sure Frequency is numeric:

# get info of columns
dfPts.info()

Then create pivot table:

# create a pivot table
pivot_table = (pd.pivot_table(dfPts, index=['PtsID','VisitID'], 
               columns='ICU', values='Frequency',
               aggfunc='sum') # you can change the aggregation function 
               #.reset_index() # uncomment if you want indices as columns 
               )

# check pivot table
pivot_table.head(10)

If you find that some ICUs from dfICU are missing in the pivot table columns, you can add them with 0 values like this:

# create list of missing columns
ICU_list = list(dfICU.ICU.unique())
missing_cols = [col for col in ICU_list 
                if col not in pivot_table.columns]

# add missing columns to pivot table
for col in missing_cols:
    pivot_table[col] = 0

# check pivot table
pivot_table.head()

答案3

得分: 1

另一种可能的解决方案:

(pd.concat([dfPts, dfICU])
 .pivot(index=['PtsID', 'VisitID'], columns='ICU', values='Frequency')
 .dropna(how='all').reset_index().fillna(0))

输出:

ICU  PtsID  VisitID   A1   A2   A3   B1  B2Closed  B2Covid   B7  C1West  \
0    105.0     44.0  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0   
1    105.0     80.0  0.0  0.0  0.0  0.0       0.0      0.0  8.0     0.0
2    882.0     35.0  0.0  9.0  0.0  0.0       0.0      0.0  0.0     0.0
3    882.0     91.0  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0
4    934.0     15.0  0.0  0.0  4.0  0.0       0.0      0.0  0.0     0.0
5    934.0     62.0  0.0  6.0  0.0  0.0       0.0      5.0  0.0     0.0

ICU  C2South   C3  P53Child
0        2.0  0.0       0.0
1        0.0  0.0       0.0
2        7.0  2.0       0.0
3        0.0  0.0       5.0
4        2.0  0.0       0.0
5        0.0  0.0       0.0
英文:

Another possible solution:

(pd.concat([dfPts, dfICU])
 .pivot(index=['PtsID', 'VisitID'], columns='ICU', values='Frequency')
 .dropna(how='all').reset_index().fillna(0))

Output:

ICU  PtsID  VisitID   A1   A2   A3   B1  B2Closed  B2Covid   B7  C1West  \
0    105.0     44.0  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0   
1    105.0     80.0  0.0  0.0  0.0  0.0       0.0      0.0  8.0     0.0   
2    882.0     35.0  0.0  9.0  0.0  0.0       0.0      0.0  0.0     0.0   
3    882.0     91.0  0.0  0.0  0.0  0.0       0.0      0.0  0.0     0.0   
4    934.0     15.0  0.0  0.0  4.0  0.0       0.0      0.0  0.0     0.0   
5    934.0     62.0  0.0  6.0  0.0  0.0       0.0      5.0  0.0     0.0   

ICU  C2South   C3  P53Child  
0        2.0  0.0       0.0  
1        0.0  0.0       0.0  
2        7.0  2.0       0.0  
3        0.0  0.0       5.0  
4        2.0  0.0       0.0  
5        0.0  0.0       0.0 

huangapple
  • 本文由 发表于 2023年7月10日 12:18:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650659.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定