英文:
How to join 2 dataframes and pivot them
问题
如何合并两个数据框并对它们进行透视
我有一个名为dfICU的数据框,其中包含医院中ICU单位的列表
ICU
A1
A2
A3
B1
B2Closed
B2Covid
B7
C1West
C2South
C3
。
。
。
P53Child
另一个数据框dfPts包含患者信息
PtsID VisitID ICU Frequency
934 15 A3 4
934 15 C2South 2
934 62 B2Covid 5
934 62 A2 6
882 35 C2South 7
882 35 C3 2
882 35 A2 9
882 91 P53Child 5
105 44 C2South 2
105 80 B7 8
我试图将它们合并成一个单一的透视数据框,如果dfPts中不存在该ICU单位,则显示0频率
类似于这样
PtsID VisitID A1 A2 A3 B1 B2Closed B2Covid B7 C1West C2South C3 .... P53Child
934 15 0 0 4 0 0 0 0 0 2 0 0
934 62 0 6 0 0 0 5 0 0 0 0 0
882 35 0 0 0 0 0 0 0 0 7 2 0
882 91 0 0 0 0 0 0 0 0 0 0 5
105 44 0 0 0 0 0 0 0 0 2 0 0
105 80 0 0 0 0 0 0 8 0 0 0 0
我首先对dfPts进行透视,但这并没有添加所有dfICU中的ICU单位,因为某些ICUs对于所有患者都为0
到目前为止,我已经做了以下工作,但之后不知道该怎么做
df = dfPts.set_index(['PtsID','VisitID']).pivot(columns='ICU')['Frequency']
df[np.isnan(df)] = 0
如何实现这一目标?
英文:
how to join 2 dataframes and pivot them
I have this dfICU dataframe that has list of ICU units in a hospital
ICU
A1
A2
A3
B1
B2Closed
B2Covid
B7
C1West
C2South
C3
.
.
.
P53Child
the other dataframe dfPts has Patients info
PtsID VisitID ICU Frequency
934 15 A3 4
934 15 C2South 2
934 62 B2Covid 5
934 62 A2 6
882 35 C2South 7
882 35 C3 2
882 35 A2 9
882 91 P53Child 5
105 44 C2South 2
105 80 B7 8
I am trying to put them both in a single pivoted dataframe so if the ICU unit does not exit in the dfPts it shows 0 Frequency
Something like this
PtsID VisitID A1 A2 A3 B1 B2Closed B2Covid B7 C1West C2South C3 .... P53Child
934 15 0 0 4 0 0 0 0 0 2 0 0
934 62 0 6 0 0 0 5 0 0 0 0 0
882 35 0 0 0 0 0 0 0 0 7 2 0
882 91 0 0 0 0 0 0 0 0 0 0 5
105 44 0 0 0 0 0 0 0 0 2 0 0
105 80 0 0 0 0 0 0 8 0 0 0 0
I start by pivoting the dfPts but that did not add all ICU units in dfICU because some ICUs are 0 for all patients
here is what i have done so far and did not know what to do after
df = dfPts.set_index(['PtsID','VisitID']).pivot(columns='ICU')['Frequency']
df[np.isnan(df)] = 0
How to do that?
答案1
得分: 2
将数据框进行透视,然后重新索引以确保所有的ICU都出现在列标题中,然后用0填充缺失的ICU的值。
结果
ICU PtsID VisitID A1 A2 A3 B1 B2Closed B2Covid B7 C1West C2South C3 P53Child
0 105 44 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
1 105 80 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0
2 882 35 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 7.0 2.0 0.0
3 882 91 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0
4 934 15 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
5 934 62 0.0 6.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 0.0 0.0
英文:
Pivot the dataframe then reindex to ensure all the icus are present in the column headers, then fill the values in missing icus with 0
icus = dfICU['ICU'].unique()
(
dfPts
.pivot(index=['PtsID', 'VisitID'], columns='ICU', values='Frequency')
.reindex(columns=icus)
.fillna(0).reset_index()
)
Result
ICU PtsID VisitID A1 A2 A3 B1 B2Closed B2Covid B7 C1West C2South C3 P53Child
0 105 44 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
1 105 80 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0
2 882 35 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 7.0 2.0 0.0
3 882 91 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0
4 934 15 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0
5 934 62 0.0 6.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 0.0 0.0
答案2
得分: 1
由于dfICU
只包含已经存在于dfPts
的ICU
列中的值的列表,因此您只需要使用dfPts
来创建您想要的数据透视表。
在创建数据透视表之前,请确保Frequency
是数值类型:
# 获取列的信息
dfPts.info()
然后创建数据透视表:
# 创建数据透视表
pivot_table = pd.pivot_table(dfPts, index=['PtsID', 'VisitID'],
columns='ICU', values='Frequency',
aggfunc='sum') # 您可以更改聚合函数
#.reset_index() # 如果您想要索引作为列,请取消注释
# 检查数据透视表
pivot_table.head(10)
如果发现dfICU
中的一些ICU在数据透视表的列中不存在,您可以像这样添加它们并将值设置为0:
# 创建缺失列的列表
ICU_list = list(dfICU.ICU.unique())
missing_cols = [col for col in ICU_list
if col not in pivot_table.columns]
# 将缺失的列添加到数据透视表并设置值为0
for col in missing_cols:
pivot_table[col] = 0
# 检查数据透视表
pivot_table.head()
英文:
Since dfICU
only contains a list of values that are already present in the column ICU
in dfPts
, you only need to use dfPts
for the pivot table you want.
Before creating the pivot table, make sure Frequency
is numeric:
# get info of columns
dfPts.info()
Then create pivot table:
# create a pivot table
pivot_table = (pd.pivot_table(dfPts, index=['PtsID','VisitID'],
columns='ICU', values='Frequency',
aggfunc='sum') # you can change the aggregation function
#.reset_index() # uncomment if you want indices as columns
)
# check pivot table
pivot_table.head(10)
If you find that some ICUs from dfICU
are missing in the pivot table columns, you can add them with 0 values like this:
# create list of missing columns
ICU_list = list(dfICU.ICU.unique())
missing_cols = [col for col in ICU_list
if col not in pivot_table.columns]
# add missing columns to pivot table
for col in missing_cols:
pivot_table[col] = 0
# check pivot table
pivot_table.head()
答案3
得分: 1
另一种可能的解决方案:
(pd.concat([dfPts, dfICU])
.pivot(index=['PtsID', 'VisitID'], columns='ICU', values='Frequency')
.dropna(how='all').reset_index().fillna(0))
输出:
ICU PtsID VisitID A1 A2 A3 B1 B2Closed B2Covid B7 C1West \
0 105.0 44.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 105.0 80.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0
2 882.0 35.0 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0
3 882.0 91.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 934.0 15.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0
5 934.0 62.0 0.0 6.0 0.0 0.0 0.0 5.0 0.0 0.0
ICU C2South C3 P53Child
0 2.0 0.0 0.0
1 0.0 0.0 0.0
2 7.0 2.0 0.0
3 0.0 0.0 5.0
4 2.0 0.0 0.0
5 0.0 0.0 0.0
英文:
Another possible solution:
(pd.concat([dfPts, dfICU])
.pivot(index=['PtsID', 'VisitID'], columns='ICU', values='Frequency')
.dropna(how='all').reset_index().fillna(0))
Output:
ICU PtsID VisitID A1 A2 A3 B1 B2Closed B2Covid B7 C1West \
0 105.0 44.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 105.0 80.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0
2 882.0 35.0 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0
3 882.0 91.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 934.0 15.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0
5 934.0 62.0 0.0 6.0 0.0 0.0 0.0 5.0 0.0 0.0
ICU C2South C3 P53Child
0 2.0 0.0 0.0
1 0.0 0.0 0.0
2 7.0 2.0 0.0
3 0.0 0.0 5.0
4 2.0 0.0 0.0
5 0.0 0.0 0.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论