Python: 根据观测日期创建图表(而不是作为时间序列)

huangapple go评论58阅读模式
英文:

Python: creating plot based on observation dates (not as a time series)

问题

我有以下的数据集

df
id medication_date 
1  2000-01-01
1  2000-01-04
1  2000-01-06
2  2000-04-01
2  2000-04-02
2  2000-04-03

我想首先将数据集重塑为每位患者首次观察后的天数:

id day1 day2 day3 day4 
1  yes  no   no   yes 
2  yes  yes  yes  no

最终要创建一个使用上述表格的绘图: 列是日期,如果是“yes”则为黑色,如果不是则为白色。

非常感谢任何帮助。

英文:

I have the following dataset

df
id medication_date 
1  2000-01-01
1  2000-01-04
1  2000-01-06
2  2000-04-01
2  2000-04-02
2  2000-04-03

I would like to first reshape the data set into days after the first observation per patient:

id day1 day2 day3 day4 
1  yes  no   no   yes 
2  yes  yes  yes  no

in order to ultimately create a plot with the above table: columns the dates and in black if yes, and white if not.

any help really appreciated it

答案1

得分: 2

将稀疏的Series('yes'药物)转换为稠密的Series,通过添加缺失的日期('no'药物),然后重置Series的索引(2000-01-01 -> 0, 2000-04-01 -> 0)。最后,重新塑造您的数据框。

def f(sr):
    # 创建缺失的日期
    dti = pd.date_range(sr.min(), sr.max(), freq='D')
    # 用'yes'或'no'填充Series
    return (pd.Series('yes', index=sr.tolist())
              .reindex(dti, fill_value='no')
              .reset_index(drop=True))

df['medication_date'] = pd.to_datetime(df['medication_date'])
out = (df.groupby('id')['medication_date'].apply(f).unstack(fill_value='no')
         .rename(columns=lambda x: f'day{x+1}').reset_index())

输出:

>>> out
   id day1 day2 day3 day4 day5 day6
0   1  yes   no   no  yes   no  yes
1   2  yes  yes  yes   no   no   no

更新

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

colors = ["white", "black"]
cmap = LinearSegmentedColormap.from_list('Custom', colors, len(colors))
plt.matshow(out.set_index('id').eq('yes').astype(int), cmap=cmap)
plt.show()

Python: 根据观测日期创建图表(而不是作为时间序列)

英文:

Transform the sparse Series ('yes' medication) to dense Series by adding missing days ('no' medication) then reset the Series index (2000-01-01 -> 0, 2000-04-01 -> 0). Finally, reshape your dataframe.

def f(sr):
    # Create missing dates
    dti = pd.date_range(sr.min(), sr.max(), freq='D')
    # Fill the Series with 'yes' or 'no'
    return (pd.Series('yes', index=sr.tolist())
              .reindex(dti, fill_value='no')
              .reset_index(drop=True))

df['medication_date'] = pd.to_datetime(df['medication_date'])
out = (df.groupby('id')['medication_date'].apply(f).unstack(fill_value='no')
         .rename(columns=lambda x: f'day{x+1}').reset_index())

Output:

>>> out
   id day1 day2 day3 day4 day5 day6
0   1  yes   no   no  yes   no  yes
1   2  yes  yes  yes   no   no   no

Update

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

colors = ["white", "black"] 
cmap = LinearSegmentedColormap.from_list('Custom', colors, len(colors))
plt.matshow(out.set_index('id').eq('yes').astype(int), cmap=cmap)
plt.show()

Python: 根据观测日期创建图表(而不是作为时间序列)

huangapple
  • 本文由 发表于 2023年3月3日 22:07:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628106.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定