Python: 根据观测日期创建图表(而不是作为时间序列)

huangapple go评论96阅读模式
英文:

Python: creating plot based on observation dates (not as a time series)

问题

我有以下的数据集

  1. df
  2. id medication_date
  3. 1 2000-01-01
  4. 1 2000-01-04
  5. 1 2000-01-06
  6. 2 2000-04-01
  7. 2 2000-04-02
  8. 2 2000-04-03

我想首先将数据集重塑为每位患者首次观察后的天数:

  1. id day1 day2 day3 day4
  2. 1 yes no no yes
  3. 2 yes yes yes no

最终要创建一个使用上述表格的绘图: 列是日期,如果是“yes”则为黑色,如果不是则为白色。

非常感谢任何帮助。

英文:

I have the following dataset

  1. df
  2. id medication_date
  3. 1 2000-01-01
  4. 1 2000-01-04
  5. 1 2000-01-06
  6. 2 2000-04-01
  7. 2 2000-04-02
  8. 2 2000-04-03

I would like to first reshape the data set into days after the first observation per patient:

  1. id day1 day2 day3 day4
  2. 1 yes no no yes
  3. 2 yes yes yes no

in order to ultimately create a plot with the above table: columns the dates and in black if yes, and white if not.

any help really appreciated it

答案1

得分: 2

将稀疏的Series('yes'药物)转换为稠密的Series,通过添加缺失的日期('no'药物),然后重置Series的索引(2000-01-01 -> 0, 2000-04-01 -> 0)。最后,重新塑造您的数据框。

  1. def f(sr):
  2. # 创建缺失的日期
  3. dti = pd.date_range(sr.min(), sr.max(), freq='D')
  4. # 用'yes'或'no'填充Series
  5. return (pd.Series('yes', index=sr.tolist())
  6. .reindex(dti, fill_value='no')
  7. .reset_index(drop=True))
  8. df['medication_date'] = pd.to_datetime(df['medication_date'])
  9. out = (df.groupby('id')['medication_date'].apply(f).unstack(fill_value='no')
  10. .rename(columns=lambda x: f'day{x+1}').reset_index())

输出:

  1. >>> out
  2. id day1 day2 day3 day4 day5 day6
  3. 0 1 yes no no yes no yes
  4. 1 2 yes yes yes no no no

更新

  1. import matplotlib.pyplot as plt
  2. from matplotlib.colors import LinearSegmentedColormap
  3. colors = ["white", "black"]
  4. cmap = LinearSegmentedColormap.from_list('Custom', colors, len(colors))
  5. plt.matshow(out.set_index('id').eq('yes').astype(int), cmap=cmap)
  6. plt.show()

Python: 根据观测日期创建图表(而不是作为时间序列)

英文:

Transform the sparse Series ('yes' medication) to dense Series by adding missing days ('no' medication) then reset the Series index (2000-01-01 -> 0, 2000-04-01 -> 0). Finally, reshape your dataframe.

  1. def f(sr):
  2. # Create missing dates
  3. dti = pd.date_range(sr.min(), sr.max(), freq='D')
  4. # Fill the Series with 'yes' or 'no'
  5. return (pd.Series('yes', index=sr.tolist())
  6. .reindex(dti, fill_value='no')
  7. .reset_index(drop=True))
  8. df['medication_date'] = pd.to_datetime(df['medication_date'])
  9. out = (df.groupby('id')['medication_date'].apply(f).unstack(fill_value='no')
  10. .rename(columns=lambda x: f'day{x+1}').reset_index())

Output:

  1. >>> out
  2. id day1 day2 day3 day4 day5 day6
  3. 0 1 yes no no yes no yes
  4. 1 2 yes yes yes no no no

Update

  1. import matplotlib.pyplot as plt
  2. from matplotlib.colors import LinearSegmentedColormap
  3. colors = ["white", "black"]
  4. cmap = LinearSegmentedColormap.from_list('Custom', colors, len(colors))
  5. plt.matshow(out.set_index('id').eq('yes').astype(int), cmap=cmap)
  6. plt.show()

Python: 根据观测日期创建图表(而不是作为时间序列)

huangapple
  • 本文由 发表于 2023年3月3日 22:07:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628106.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定