英文:
Python: creating plot based on observation dates (not as a time series)
问题
我有以下的数据集
df
id medication_date
1 2000-01-01
1 2000-01-04
1 2000-01-06
2 2000-04-01
2 2000-04-02
2 2000-04-03
我想首先将数据集重塑为每位患者首次观察后的天数:
id day1 day2 day3 day4
1 yes no no yes
2 yes yes yes no
最终要创建一个使用上述表格的绘图: 列是日期,如果是“yes”则为黑色,如果不是则为白色。
非常感谢任何帮助。
英文:
I have the following dataset
df
id medication_date
1 2000-01-01
1 2000-01-04
1 2000-01-06
2 2000-04-01
2 2000-04-02
2 2000-04-03
I would like to first reshape the data set into days after the first observation per patient:
id day1 day2 day3 day4
1 yes no no yes
2 yes yes yes no
in order to ultimately create a plot with the above table: columns the dates and in black if yes, and white if not.
any help really appreciated it
答案1
得分: 2
将稀疏的Series('yes'药物)转换为稠密的Series,通过添加缺失的日期('no'药物),然后重置Series的索引(2000-01-01 -> 0, 2000-04-01 -> 0)。最后,重新塑造您的数据框。
def f(sr):
# 创建缺失的日期
dti = pd.date_range(sr.min(), sr.max(), freq='D')
# 用'yes'或'no'填充Series
return (pd.Series('yes', index=sr.tolist())
.reindex(dti, fill_value='no')
.reset_index(drop=True))
df['medication_date'] = pd.to_datetime(df['medication_date'])
out = (df.groupby('id')['medication_date'].apply(f).unstack(fill_value='no')
.rename(columns=lambda x: f'day{x+1}').reset_index())
输出:
>>> out
id day1 day2 day3 day4 day5 day6
0 1 yes no no yes no yes
1 2 yes yes yes no no no
更新
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
colors = ["white", "black"]
cmap = LinearSegmentedColormap.from_list('Custom', colors, len(colors))
plt.matshow(out.set_index('id').eq('yes').astype(int), cmap=cmap)
plt.show()
英文:
Transform the sparse Series ('yes' medication) to dense Series by adding missing days ('no' medication) then reset the Series index (2000-01-01 -> 0, 2000-04-01 -> 0). Finally, reshape your dataframe.
def f(sr):
# Create missing dates
dti = pd.date_range(sr.min(), sr.max(), freq='D')
# Fill the Series with 'yes' or 'no'
return (pd.Series('yes', index=sr.tolist())
.reindex(dti, fill_value='no')
.reset_index(drop=True))
df['medication_date'] = pd.to_datetime(df['medication_date'])
out = (df.groupby('id')['medication_date'].apply(f).unstack(fill_value='no')
.rename(columns=lambda x: f'day{x+1}').reset_index())
Output:
>>> out
id day1 day2 day3 day4 day5 day6
0 1 yes no no yes no yes
1 2 yes yes yes no no no
Update
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
colors = ["white", "black"]
cmap = LinearSegmentedColormap.from_list('Custom', colors, len(colors))
plt.matshow(out.set_index('id').eq('yes').astype(int), cmap=cmap)
plt.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论