英文:
Identify the day diff between 2 dates in a column and flag the pattern in pandas
问题
我可以帮你将代码部分翻译成中文。以下是代码的翻译部分:
import pandas as pd
# 创建数据框
df_in = pd.DataFrame([["A","2023-02-04"],["A","2023-02-05"],["A","2023-02-06"],
["B","2023-02-06"],["B","2023-02-13"],["B","2023-02-20"],
["C","2023-02-07"],["C","2023-02-10"],["C","2023-02-12"],
["D","2023-02-14"],["D","2023-02-17"],["D","2023-02-20"],
["E","2023-02-18"]], columns=["id","date"])
# 创建一个新的数据框 df_out
df_out = pd.DataFrame(columns=["id","date","day_difference","Flag"])
# 初始化变量
prev_id = None
prev_date = None
day_diff = None
flag = None
# 循环遍历数据框
for index, row in df_in.iterrows():
current_id = row["id"]
current_date = pd.to_datetime(row["date"])
# 如果当前id与上一个id不同,或者是第一行数据
if current_id != prev_id or prev_id is None:
day_diff = "_"
flag = "No_pattern"
else:
day_diff = (current_date - prev_date).days
if day_diff == 1:
flag = "1_days_diff"
elif day_diff == 7:
flag = "7_days_diff"
elif day_diff == "_":
flag = "No_pattern"
else:
flag = str(day_diff) + "_days_diff"
# 将结果添加到 df_out
df_out = df_out.append({"id": current_id, "date": current_date, "day_difference": day_diff, "Flag": flag}, ignore_index=True)
# 更新变量
prev_id = current_id
prev_date = current_date
# 打印结果
print(df_out)
这段代码将会根据你的要求,从输入的数据框 df_in
中计算出 day_difference
和 Flag
,然后将结果存储在 df_out
数据框中。
英文:
I have a dataframe
df_in = pd.DataFrame([["A","2023-02-04"],["A","2023-02-05"],["A","2023-02-06"],["B","2023-02-06"],["B","2023-02-13"],["B","2023-02-20"],
["C","2023-02-07"],["C","2023-02-10"],["C","2023-02-12"],["D","2023-02-14"],["D","2023-02-17"],["D","2023-02-20"],
["E","2023-02-18"]],columns=["id","date"])
id date
A 2023-02-04
A 2023-02-05
A 2023-02-06
B 2023-02-06
B 2023-02-13
B 2023-02-20
C 2023-02-07
C 2023-02-10
C 2023-02-12
D 2023-02-14
D 2023-02-17
D 2023-02-20
E 2023-02-18
I want to derive 2 new columns from the dataframe. 1st column day_difference tells day diff between that row and previous row at id level. 2nd column Flag which tells the pattern at id level. Example id A has daily data then mention Daily, id B has 7 days diff pattern mention 7_days_diff. id C has not any pattern so mention No_pattern. id E has only single row, mention Single_day.
Expected output:
df_out = pd.DataFrame([["A","2023-02-04","_","1_days_diff"],["A","2023-02-05",1,"1_days_diff"],["A","2023-02-06",1,"1_days_diff"],
["B","2023-02-06","_","7_days_diff"],["B","2023-02-13",7,"7_days_diff"],["B","2023-02-20",7,"7_days_diff"],
["C","2023-02-07","_","No_pattern"],["C","2023-02-10",3,"No_pattern"],["C","2023-02-12",2,"No_pattern"],
["D","2023-02-14","_","3_days_diff"],["D","2023-02-17","3","3_days_diff"],["D","2023-02-20","3","3_days_diff"],
["E","2023-02-18",1,"Single_day"]],columns=["id","date","day_difference","Flag"])
id date day_difference Flag
A 2023-02-04 _ 1_days_diff
A 2023-02-05 1 1_days_diff
A 2023-02-06 1 1_days_diff
B 2023-02-06 _ 7_days_diff
B 2023-02-13 7 7_days_diff
B 2023-02-20 7 7_days_diff
C 2023-02-07 _ No_pattern
C 2023-02-10 3 No_pattern
C 2023-02-12 2 No_pattern
D 2023-02-14 _ 3_days_diff
D 2023-02-17 3 3_days_diff
D 2023-02-20 3 3_days_diff
E 2023-02-18 1 Single_day
How to do it in pandas?
答案1
得分: 1
以下是代码的中文翻译部分:
# 将值转换为日期时间
df_in['date'] = pd.to_datetime(df_in['date'])
# 获取每个组的日期差异(以天为单位)
df_in['day_difference'] = df_in.groupby('id')['date'].diff().dt.days
# 获取唯一值的数量和第一个非缺失值
df = df_in.groupby('id')['day_difference'].agg(['nunique', 'first'])
s = df_in['id'].map(df['nunique'])
# 创建由第一个非 NaN 值填充的列
s1 = df_in['id'].map(df['first']).fillna(0).astype(int).astype(str).add('_days_diff')
# 设置默认值为`No_pattern`的标志列
df_in['flag'] = np.select(展开收缩,
[s1, 'Single_day'],
'No_pattern')
print(df_in)
注意:这是您提供的代码的翻译部分,没有其他内容。
英文:
Use:
#convert values to datetimes
df_in['date'] = pd.to_datetime(df_in['date'])
#get differencies per groups in days
df_in['day_difference'] = df_in.groupby('id')['date'].diff().dt.days
#get number of unique values and first non missing value
df = df_in.groupby('id')['day_difference'].agg(['nunique','first'])
s = df_in['id'].map(df['nunique'])
#create column filled by first non NaNs values
s1 = df_in['id'].map(df['first']).fillna(0).astype(int).astype(str).add('_days_diff')
#set flag column with default value `No_pattern`
df_in['flag'] = np.select(展开收缩,
[s1, 'Single_day'],
'No_pattern')
print (df_in)
id date day_difference flag
0 A 2023-02-04 NaN 1_days_diff
1 A 2023-02-05 1.0 1_days_diff
2 A 2023-02-06 1.0 1_days_diff
3 B 2023-02-06 NaN 7_days_diff
4 B 2023-02-13 7.0 7_days_diff
5 B 2023-02-20 7.0 7_days_diff
6 C 2023-02-07 NaN No_pattern
7 C 2023-02-10 3.0 No_pattern
8 C 2023-02-12 2.0 No_pattern
9 D 2023-02-14 NaN 3_days_diff
10 D 2023-02-17 3.0 3_days_diff
11 D 2023-02-20 3.0 3_days_diff
12 E 2023-02-18 NaN Single_day
答案2
得分: 1
我将为您提供代码部分的中文翻译:
# 使用 `groupby.apply` 方法创建自定义函数
def pattern(g):
# 计算日期差异
diff = pd.to_datetime(g['date']).diff().dt.days
return pd.DataFrame({'day_difference': diff,
'Flag': 'Single_day' if len(g) == 1 else
(f'{int(diff.iloc[-1])}_days_diff'
if len(diff.dropna().unique()) == 1
else 'No_pattern')
})
# 使用自定义函数并将结果与原始数据框连接
out = df_in.join(df_in.groupby('id', group_keys=False).apply(pattern))
输出:
id date day_difference Flag
0 A 2023-02-04 NaN 1_days_diff
1 A 2023-02-05 1.0 1_days_diff
2 A 2023-02-06 1.0 1_days_diff
3 B 2023-02-06 NaN 7_days_diff
4 B 2023-02-13 7.0 7_days_diff
5 B 2023-02-20 7.0 7_days_diff
6 C 2023-02-07 NaN No_pattern
7 C 2023-02-10 3.0 No_pattern
8 C 2023-02-12 2.0 No_pattern
9 D 2023-02-14 NaN 3_days_diff
10 D 2023-02-17 3.0 3_days_diff
11 D 2023-02-20 3.0 3_days_diff
12 E 2023-02-18 NaN Single_day
请注意,这是代码的中文翻译,其中包括函数和变量名称。如果您有其他疑问,请随时提出。
英文:
I would use a custom function with groupby.apply
:
def pattern(g):
diff = pd.to_datetime(g['date']).diff().dt.days
return pd.DataFrame({'day_difference': diff,
'Flag': 'Single_day' if len(g)==1 else
(f'{int(diff.iloc[-1])}_days_diff'
if len(diff.dropna().unique())==1
else 'No_pattern')
})
out = df_in.join(df_in.groupby('id', group_keys=False).apply(pattern))
Output:
id date day_difference Flag
0 A 2023-02-04 NaN 1_days_diff
1 A 2023-02-05 1.0 1_days_diff
2 A 2023-02-06 1.0 1_days_diff
3 B 2023-02-06 NaN 7_days_diff
4 B 2023-02-13 7.0 7_days_diff
5 B 2023-02-20 7.0 7_days_diff
6 C 2023-02-07 NaN No_pattern
7 C 2023-02-10 3.0 No_pattern
8 C 2023-02-12 2.0 No_pattern
9 D 2023-02-14 NaN 3_days_diff
10 D 2023-02-17 3.0 3_days_diff
11 D 2023-02-20 3.0 3_days_diff
12 E 2023-02-18 NaN Single_day
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论