英文:
python: sorting time interval data into two days chucks based on index event
问题
我有以下数据:
df =
id date_medication medication index_date
1 2000-01-01 A 2000-01-04
1 2000-01-02 A 2000-01-04
1 2000-01-05 B 2000-01-04
1 2000-01-06 B 2000-01-04
2 2000-01-01 A 2000-01-05
2 2000-01-03 B 2000-01-05
2 2000-01-06 A 2000-01-05
2 2000-01-10 B 2000-01-05
我想将数据转换为围绕索引事件(IE)的两天时间段。创建新的列代表时间间隔,如下:
df =
id -4 -2 0 2 4 6
1 A A IE B 0 0
2 A B IE A A B
<details>
<summary>英文:</summary>
I have the following data:
df =
id date_medication medication index_date
1 2000-01-01 A 2000-01-04
1 2000-01-02 A 2000-01-04
1 2000-01-05 B 2000-01-04
1 2000-01-06 B 2000-01-04
2 2000-01-01 A 2000-01-05
2 2000-01-03 B 2000-01-05
2 2000-01-06 A 2000-01-05
2 2000-01-10 B 2000-01-05
and I would like to transform the data into two days' chucks around the index event (IE). That is creating new columns representing the time intervals such as:
df =
id -4 -2 0 2 4 6
1 A A IE B 0 0
2 A B IE A A B
</details>
# 答案1
**得分**: 2
```python
#将列转换为日期时间
df['date_medication'] = pd.to_datetime(df['date_medication'])
df['index_date'] = pd.to_datetime(df['index_date'])
#获取2天的时间段
s = df['date_medication'].sub(df['index_date']).dt.days // 2 * 2
#对大于等于0的值添加2天
s.loc[s.ge(0)] += 2
#透视列
df1 = df.assign(g=s).pivot(index='id', columns='g', values='medication')
#添加0列
df1.loc[:, 0] = 'IE'
#添加0列
df1 = (df1.rename_axis(columns=None)
.reindex(columns=range(df1.columns.min(), df1.columns.max() + 2, 2), fill_value=0)
.fillna(0)
.reset_index())
id -4 -2 0 2 4 6
0 1 A A IE B B 0
1 2 A B IE A 0 B
print(s)
0 -4
1 -2
2 2
3 4
4 -4
5 -2
6 2
7 6
dtype: int64
英文:
Use:
#convert columns to datetimes
df['date_medication'] = pd.to_datetime(df['date_medication'])
df['index_date'] = pd.to_datetime(df['index_date'])
#get 2 days chunks
s = df['date_medication'].sub(df['index_date']).dt.days // 2 * 2
#add 2 days for greater/equal values 0
s.loc[s.ge(0)] += 2
#pivoting columns
df1 = df.assign(g = s).pivot(index='id', columns='g', values='medication')
#added 0 column
df1.loc[:, 0] = 'IE'
#added 0 column
df1 = (df1.rename_axis(columns=None)
.reindex(columns=range(df1.columns.min(), df1.columns.max() + 2, 2), fill_value=0)
.fillna(0)
.reset_index())
id -4 -2 0 2 4 6
0 1 A A IE B B 0
1 2 A B IE A 0 B
Details:
print (s)
0 -4
1 -2
2 2
3 4
4 -4
5 -2
6 2
7 6
dtype: int64
答案2
得分: 2
# 计算差值
days = df['date_medication'].sub(df['index_date']).dt.days
# 创建所需的区间和标签
bins = np.arange(days.min() - days.min() % 2, days.max() + days.max() % 2 + 1, 2)
lbls = bins[bins != 0] # 排除0
df['interval'] = pd.cut(days, bins, labels=lbls, include_lowest=True, right=False)
# 重塑数据框
out = (df.pivot(index='id', columns='interval', values='medication')
.reindex(bins, fill_value='IE', axis=1).fillna(0)
.rename_axis(columns=None).reset_index())
英文:
You can use:
# Compute the delta
days = df['date_medication'].sub(df['index_date']).dt.days
# Create desired bins and labels
bins = np.arange(days.min() - days.min() % 2, days.max() + days.max() % 2 + 1, 2)
lbls = bins[bins != 0] # Exclude 0
df['interval'] = pd.cut(days, bins, labels=lbls, include_lowest=True, right=False)
# Reshape your dataframe
out = (df.pivot(index='id', columns='interval', values='medication')
.reindex(bins, fill_value='IE', axis=1).fillna(0)
.rename_axis(columns=None).reset_index())
Output:
>>> out
id -4 -2 0 2 4 6
0 1 A A IE B B 0
1 2 A B IE A 0 B
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论