英文:
Identify the day diff between 2 dates in a column and flag the pattern in pandas
问题
我可以帮你将代码部分翻译成中文。以下是代码的翻译部分:
import pandas as pd
# 创建数据框
df_in = pd.DataFrame([["A","2023-02-04"],["A","2023-02-05"],["A","2023-02-06"],
                      ["B","2023-02-06"],["B","2023-02-13"],["B","2023-02-20"],
                      ["C","2023-02-07"],["C","2023-02-10"],["C","2023-02-12"],
                      ["D","2023-02-14"],["D","2023-02-17"],["D","2023-02-20"],
                      ["E","2023-02-18"]], columns=["id","date"])
# 创建一个新的数据框 df_out
df_out = pd.DataFrame(columns=["id","date","day_difference","Flag"])
# 初始化变量
prev_id = None
prev_date = None
day_diff = None
flag = None
# 循环遍历数据框
for index, row in df_in.iterrows():
    current_id = row["id"]
    current_date = pd.to_datetime(row["date"])
    
    # 如果当前id与上一个id不同,或者是第一行数据
    if current_id != prev_id or prev_id is None:
        day_diff = "_"
        flag = "No_pattern"
    else:
        day_diff = (current_date - prev_date).days
    
    if day_diff == 1:
        flag = "1_days_diff"
    elif day_diff == 7:
        flag = "7_days_diff"
    elif day_diff == "_":
        flag = "No_pattern"
    else:
        flag = str(day_diff) + "_days_diff"
    
    # 将结果添加到 df_out
    df_out = df_out.append({"id": current_id, "date": current_date, "day_difference": day_diff, "Flag": flag}, ignore_index=True)
    
    # 更新变量
    prev_id = current_id
    prev_date = current_date
# 打印结果
print(df_out)
这段代码将会根据你的要求,从输入的数据框 df_in 中计算出 day_difference 和 Flag,然后将结果存储在 df_out 数据框中。
英文:
I have a dataframe
df_in = pd.DataFrame([["A","2023-02-04"],["A","2023-02-05"],["A","2023-02-06"],["B","2023-02-06"],["B","2023-02-13"],["B","2023-02-20"],
                      ["C","2023-02-07"],["C","2023-02-10"],["C","2023-02-12"],["D","2023-02-14"],["D","2023-02-17"],["D","2023-02-20"],
                      ["E","2023-02-18"]],columns=["id","date"])
id	   date
A	2023-02-04
A	2023-02-05
A	2023-02-06
B	2023-02-06
B	2023-02-13
B	2023-02-20
C	2023-02-07
C	2023-02-10
C	2023-02-12
D	2023-02-14
D	2023-02-17
D	2023-02-20
E	2023-02-18
I want to derive 2 new columns from the dataframe. 1st column day_difference tells day diff between that row and previous row at id level. 2nd column Flag which tells the pattern at id level. Example id A has daily data then mention Daily, id B has 7 days diff pattern mention 7_days_diff. id C has not any pattern so mention No_pattern. id E has only single row, mention Single_day.
Expected output:
df_out = pd.DataFrame([["A","2023-02-04","_","1_days_diff"],["A","2023-02-05",1,"1_days_diff"],["A","2023-02-06",1,"1_days_diff"],
                       ["B","2023-02-06","_","7_days_diff"],["B","2023-02-13",7,"7_days_diff"],["B","2023-02-20",7,"7_days_diff"],
                       ["C","2023-02-07","_","No_pattern"],["C","2023-02-10",3,"No_pattern"],["C","2023-02-12",2,"No_pattern"],
                       ["D","2023-02-14","_","3_days_diff"],["D","2023-02-17","3","3_days_diff"],["D","2023-02-20","3","3_days_diff"],
                       ["E","2023-02-18",1,"Single_day"]],columns=["id","date","day_difference","Flag"])
id	   date	    day_difference	Flag
A	2023-02-04	  _	          1_days_diff
A	2023-02-05	  1	          1_days_diff
A	2023-02-06	  1	          1_days_diff
B	2023-02-06	  _	          7_days_diff
B	2023-02-13	  7	          7_days_diff
B	2023-02-20	  7	          7_days_diff
C	2023-02-07	  _	          No_pattern
C	2023-02-10	  3	          No_pattern
C	2023-02-12	  2	          No_pattern
D	2023-02-14	  _	          3_days_diff
D	2023-02-17	  3	          3_days_diff
D	2023-02-20	  3	          3_days_diff
E	2023-02-18	  1	          Single_day
How to do it in pandas?
答案1
得分: 1
以下是代码的中文翻译部分:
# 将值转换为日期时间
df_in['date'] = pd.to_datetime(df_in['date'])
# 获取每个组的日期差异(以天为单位)
df_in['day_difference'] = df_in.groupby('id')['date'].diff().dt.days
# 获取唯一值的数量和第一个非缺失值
df = df_in.groupby('id')['day_difference'].agg(['nunique', 'first'])
s = df_in['id'].map(df['nunique'])
# 创建由第一个非 NaN 值填充的列
s1 = df_in['id'].map(df['first']).fillna(0).astype(int).astype(str).add('_days_diff')
# 设置默认值为`No_pattern`的标志列
df_in['flag'] = np.select(展开收缩, 
                          [s1, 'Single_day'],
                          'No_pattern')
print(df_in)
注意:这是您提供的代码的翻译部分,没有其他内容。
英文:
Use:
#convert values to datetimes
df_in['date'] = pd.to_datetime(df_in['date'])
#get differencies per groups in days
df_in['day_difference'] = df_in.groupby('id')['date'].diff().dt.days
#get number of unique values and first non missing value
df = df_in.groupby('id')['day_difference'].agg(['nunique','first'])
    
s = df_in['id'].map(df['nunique'])
#create column filled by first non NaNs values
s1 = df_in['id'].map(df['first']).fillna(0).astype(int).astype(str).add('_days_diff')
#set flag column with default value `No_pattern`
df_in['flag'] = np.select(展开收缩, 
                          [s1, 'Single_day'],
                          'No_pattern')
print (df_in)
   id       date  day_difference         flag
0   A 2023-02-04             NaN  1_days_diff
1   A 2023-02-05             1.0  1_days_diff
2   A 2023-02-06             1.0  1_days_diff
3   B 2023-02-06             NaN  7_days_diff
4   B 2023-02-13             7.0  7_days_diff
5   B 2023-02-20             7.0  7_days_diff
6   C 2023-02-07             NaN   No_pattern
7   C 2023-02-10             3.0   No_pattern
8   C 2023-02-12             2.0   No_pattern
9   D 2023-02-14             NaN  3_days_diff
10  D 2023-02-17             3.0  3_days_diff
11  D 2023-02-20             3.0  3_days_diff
12  E 2023-02-18             NaN   Single_day
答案2
得分: 1
我将为您提供代码部分的中文翻译:
# 使用 `groupby.apply` 方法创建自定义函数
def pattern(g):
    # 计算日期差异
    diff = pd.to_datetime(g['date']).diff().dt.days
    return pd.DataFrame({'day_difference': diff,
                         'Flag': 'Single_day' if len(g) == 1 else
                         (f'{int(diff.iloc[-1])}_days_diff'
                          if len(diff.dropna().unique()) == 1
                          else 'No_pattern')
                        })
# 使用自定义函数并将结果与原始数据框连接
out = df_in.join(df_in.groupby('id', group_keys=False).apply(pattern))
输出:
   id        date  day_difference         Flag
0   A  2023-02-04             NaN  1_days_diff
1   A  2023-02-05             1.0  1_days_diff
2   A  2023-02-06             1.0  1_days_diff
3   B  2023-02-06             NaN  7_days_diff
4   B  2023-02-13             7.0  7_days_diff
5   B  2023-02-20             7.0  7_days_diff
6   C  2023-02-07             NaN   No_pattern
7   C  2023-02-10             3.0   No_pattern
8   C  2023-02-12             2.0   No_pattern
9   D  2023-02-14             NaN  3_days_diff
10  D  2023-02-17             3.0  3_days_diff
11  D  2023-02-20             3.0  3_days_diff
12  E  2023-02-18             NaN   Single_day
请注意,这是代码的中文翻译,其中包括函数和变量名称。如果您有其他疑问,请随时提出。
英文:
I would use a custom function with groupby.apply:
def pattern(g):
    diff = pd.to_datetime(g['date']).diff().dt.days
    return pd.DataFrame({'day_difference': diff,
                         'Flag': 'Single_day' if len(g)==1 else
                         (f'{int(diff.iloc[-1])}_days_diff'
                          if len(diff.dropna().unique())==1
                          else 'No_pattern')
                        })
out = df_in.join(df_in.groupby('id', group_keys=False).apply(pattern))
Output:
   id        date  day_difference         Flag
0   A  2023-02-04             NaN  1_days_diff
1   A  2023-02-05             1.0  1_days_diff
2   A  2023-02-06             1.0  1_days_diff
3   B  2023-02-06             NaN  7_days_diff
4   B  2023-02-13             7.0  7_days_diff
5   B  2023-02-20             7.0  7_days_diff
6   C  2023-02-07             NaN   No_pattern
7   C  2023-02-10             3.0   No_pattern
8   C  2023-02-12             2.0   No_pattern
9   D  2023-02-14             NaN  3_days_diff
10  D  2023-02-17             3.0  3_days_diff
11  D  2023-02-20             3.0  3_days_diff
12  E  2023-02-18             NaN   Single_day
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论