python:根据索引事件将时间间隔数据分成两天的块

huangapple go评论65阅读模式
英文:

python: sorting time interval data into two days chucks based on index event

问题

我有以下数据:

df =
id date_medication medication index_date
1 2000-01-01 A 2000-01-04
1 2000-01-02 A 2000-01-04
1 2000-01-05 B 2000-01-04
1 2000-01-06 B 2000-01-04
2 2000-01-01 A 2000-01-05
2 2000-01-03 B 2000-01-05
2 2000-01-06 A 2000-01-05
2 2000-01-10 B 2000-01-05


我想将数据转换为围绕索引事件(IE)的两天时间段。创建新的列代表时间间隔,如下:

df =
id -4 -2 0 2 4 6
1 A A IE B 0 0
2 A B IE A A B


<details>
<summary>英文:</summary>

I have the following data:

df =
id date_medication medication index_date
1 2000-01-01 A 2000-01-04
1 2000-01-02 A 2000-01-04
1 2000-01-05 B 2000-01-04
1 2000-01-06 B 2000-01-04
2 2000-01-01 A 2000-01-05
2 2000-01-03 B 2000-01-05
2 2000-01-06 A 2000-01-05
2 2000-01-10 B 2000-01-05


and I would like to transform the data into two days&#39; chucks around the index event (IE). That is creating new columns representing the time intervals such as:

df =
id -4 -2 0 2 4 6
1 A A IE B 0 0
2 A B IE A A B



</details>


# 答案1
**得分**: 2

```python
#将列转换为日期时间
df['date_medication'] = pd.to_datetime(df['date_medication'])
df['index_date'] = pd.to_datetime(df['index_date'])

#获取2天的时间段
s = df['date_medication'].sub(df['index_date']).dt.days // 2 * 2
#对大于等于0的值添加2天
s.loc[s.ge(0)] += 2

#透视列
df1 = df.assign(g=s).pivot(index='id', columns='g', values='medication')
#添加0列
df1.loc[:, 0] = 'IE'
#添加0列
df1 = (df1.rename_axis(columns=None)
         .reindex(columns=range(df1.columns.min(), df1.columns.max() + 2, 2), fill_value=0)
         .fillna(0)
         .reset_index())
   id -4 -2   0  2  4  6
0   1  A  A  IE  B  B  0
1   2  A  B  IE  A  0  B

print(s)
0   -4
1   -2
2    2
3    4
4   -4
5   -2
6    2
7    6
dtype: int64
英文:

Use:

#convert columns to datetimes
df[&#39;date_medication&#39;] = pd.to_datetime(df[&#39;date_medication&#39;])
df[&#39;index_date&#39;] = pd.to_datetime(df[&#39;index_date&#39;])

#get 2 days chunks
s = df[&#39;date_medication&#39;].sub(df[&#39;index_date&#39;]).dt.days // 2 * 2
#add 2 days for greater/equal values 0 
s.loc[s.ge(0)] += 2

#pivoting columns
df1 = df.assign(g = s).pivot(index=&#39;id&#39;, columns=&#39;g&#39;, values=&#39;medication&#39;)
#added 0 column
df1.loc[:, 0] = &#39;IE&#39;
#added 0 column
df1 = (df1.rename_axis(columns=None)
         .reindex(columns=range(df1.columns.min(), df1.columns.max() + 2, 2), fill_value=0)
         .fillna(0)
         .reset_index())
   id -4 -2   0  2  4  6
0   1  A  A  IE  B  B  0
1   2  A  B  IE  A  0  B

Details:

print (s)
0   -4
1   -2
2    2
3    4
4   -4
5   -2
6    2
7    6
dtype: int64

答案2

得分: 2

# 计算差值
days = df['date_medication'].sub(df['index_date']).dt.days

# 创建所需的区间和标签
bins = np.arange(days.min() - days.min() % 2, days.max() + days.max() % 2 + 1, 2)
lbls = bins[bins != 0]  # 排除0
df['interval'] = pd.cut(days, bins, labels=lbls, include_lowest=True, right=False)

# 重塑数据框
out = (df.pivot(index='id', columns='interval', values='medication')
         .reindex(bins, fill_value='IE', axis=1).fillna(0)
         .rename_axis(columns=None).reset_index())
英文:

You can use:

# Compute the delta
days = df[&#39;date_medication&#39;].sub(df[&#39;index_date&#39;]).dt.days

# Create desired bins and labels
bins = np.arange(days.min() - days.min() % 2, days.max() + days.max() % 2 + 1, 2)
lbls = bins[bins != 0]  # Exclude 0
df[&#39;interval&#39;] = pd.cut(days, bins, labels=lbls, include_lowest=True, right=False)

# Reshape your dataframe
out = (df.pivot(index=&#39;id&#39;, columns=&#39;interval&#39;, values=&#39;medication&#39;)
         .reindex(bins, fill_value=&#39;IE&#39;, axis=1).fillna(0)
         .rename_axis(columns=None).reset_index())

Output:

&gt;&gt;&gt; out
   id -4 -2   0  2  4  6
0   1  A  A  IE  B  B  0
1   2  A  B  IE  A  0  B

huangapple
  • 本文由 发表于 2023年3月9日 18:17:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75683209.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定