在pandas中识别列中两个日期之间的日期差异并标记模式。

huangapple go评论71阅读模式
英文:

Identify the day diff between 2 dates in a column and flag the pattern in pandas

问题

我可以帮你将代码部分翻译成中文。以下是代码的翻译部分:

import pandas as pd

# 创建数据框
df_in = pd.DataFrame([["A","2023-02-04"],["A","2023-02-05"],["A","2023-02-06"],
                      ["B","2023-02-06"],["B","2023-02-13"],["B","2023-02-20"],
                      ["C","2023-02-07"],["C","2023-02-10"],["C","2023-02-12"],
                      ["D","2023-02-14"],["D","2023-02-17"],["D","2023-02-20"],
                      ["E","2023-02-18"]], columns=["id","date"])

# 创建一个新的数据框 df_out
df_out = pd.DataFrame(columns=["id","date","day_difference","Flag"])

# 初始化变量
prev_id = None
prev_date = None
day_diff = None
flag = None

# 循环遍历数据框
for index, row in df_in.iterrows():
    current_id = row["id"]
    current_date = pd.to_datetime(row["date"])
    
    # 如果当前id与上一个id不同,或者是第一行数据
    if current_id != prev_id or prev_id is None:
        day_diff = "_"
        flag = "No_pattern"
    else:
        day_diff = (current_date - prev_date).days
    
    if day_diff == 1:
        flag = "1_days_diff"
    elif day_diff == 7:
        flag = "7_days_diff"
    elif day_diff == "_":
        flag = "No_pattern"
    else:
        flag = str(day_diff) + "_days_diff"
    
    # 将结果添加到 df_out
    df_out = df_out.append({"id": current_id, "date": current_date, "day_difference": day_diff, "Flag": flag}, ignore_index=True)
    
    # 更新变量
    prev_id = current_id
    prev_date = current_date

# 打印结果
print(df_out)

这段代码将会根据你的要求,从输入的数据框 df_in 中计算出 day_differenceFlag,然后将结果存储在 df_out 数据框中。

英文:

I have a dataframe

df_in = pd.DataFrame([["A","2023-02-04"],["A","2023-02-05"],["A","2023-02-06"],["B","2023-02-06"],["B","2023-02-13"],["B","2023-02-20"],
                      ["C","2023-02-07"],["C","2023-02-10"],["C","2023-02-12"],["D","2023-02-14"],["D","2023-02-17"],["D","2023-02-20"],
                      ["E","2023-02-18"]],columns=["id","date"])
id	   date
A	2023-02-04
A	2023-02-05
A	2023-02-06
B	2023-02-06
B	2023-02-13
B	2023-02-20
C	2023-02-07
C	2023-02-10
C	2023-02-12
D	2023-02-14
D	2023-02-17
D	2023-02-20
E	2023-02-18

I want to derive 2 new columns from the dataframe. 1st column day_difference tells day diff between that row and previous row at id level. 2nd column Flag which tells the pattern at id level. Example id A has daily data then mention Daily, id B has 7 days diff pattern mention 7_days_diff. id C has not any pattern so mention No_pattern. id E has only single row, mention Single_day.

Expected output:

df_out = pd.DataFrame([["A","2023-02-04","_","1_days_diff"],["A","2023-02-05",1,"1_days_diff"],["A","2023-02-06",1,"1_days_diff"],
                       ["B","2023-02-06","_","7_days_diff"],["B","2023-02-13",7,"7_days_diff"],["B","2023-02-20",7,"7_days_diff"],
                       ["C","2023-02-07","_","No_pattern"],["C","2023-02-10",3,"No_pattern"],["C","2023-02-12",2,"No_pattern"],
                       ["D","2023-02-14","_","3_days_diff"],["D","2023-02-17","3","3_days_diff"],["D","2023-02-20","3","3_days_diff"],
                       ["E","2023-02-18",1,"Single_day"]],columns=["id","date","day_difference","Flag"])
id	   date	    day_difference	Flag
A	2023-02-04	  _	          1_days_diff
A	2023-02-05	  1	          1_days_diff
A	2023-02-06	  1	          1_days_diff
B	2023-02-06	  _	          7_days_diff
B	2023-02-13	  7	          7_days_diff
B	2023-02-20	  7	          7_days_diff
C	2023-02-07	  _	          No_pattern
C	2023-02-10	  3	          No_pattern
C	2023-02-12	  2	          No_pattern
D	2023-02-14	  _	          3_days_diff
D	2023-02-17	  3	          3_days_diff
D	2023-02-20	  3	          3_days_diff
E	2023-02-18	  1	          Single_day

How to do it in pandas?

答案1

得分: 1

以下是代码的中文翻译部分:

# 将值转换为日期时间
df_in['date'] = pd.to_datetime(df_in['date'])

# 获取每个组的日期差异(以天为单位)
df_in['day_difference'] = df_in.groupby('id')['date'].diff().dt.days

# 获取唯一值的数量和第一个非缺失值
df = df_in.groupby('id')['day_difference'].agg(['nunique', 'first'])

s = df_in['id'].map(df['nunique'])
# 创建由第一个非 NaN 值填充的列
s1 = df_in['id'].map(df['first']).fillna(0).astype(int).astype(str).add('_days_diff')

# 设置默认值为`No_pattern`的标志列
df_in['flag'] = np.select(
展开收缩
,
[s1, 'Single_day'], 'No_pattern') print(df_in)

注意:这是您提供的代码的翻译部分,没有其他内容。

英文:

Use:

#convert values to datetimes
df_in['date'] = pd.to_datetime(df_in['date'])

#get differencies per groups in days
df_in['day_difference'] = df_in.groupby('id')['date'].diff().dt.days

#get number of unique values and first non missing value
df = df_in.groupby('id')['day_difference'].agg(['nunique','first'])
    
s = df_in['id'].map(df['nunique'])
#create column filled by first non NaNs values
s1 = df_in['id'].map(df['first']).fillna(0).astype(int).astype(str).add('_days_diff')

#set flag column with default value `No_pattern`
df_in['flag'] = np.select(
展开收缩
, [s1, 'Single_day'], 'No_pattern') print (df_in) id date day_difference flag 0 A 2023-02-04 NaN 1_days_diff 1 A 2023-02-05 1.0 1_days_diff 2 A 2023-02-06 1.0 1_days_diff 3 B 2023-02-06 NaN 7_days_diff 4 B 2023-02-13 7.0 7_days_diff 5 B 2023-02-20 7.0 7_days_diff 6 C 2023-02-07 NaN No_pattern 7 C 2023-02-10 3.0 No_pattern 8 C 2023-02-12 2.0 No_pattern 9 D 2023-02-14 NaN 3_days_diff 10 D 2023-02-17 3.0 3_days_diff 11 D 2023-02-20 3.0 3_days_diff 12 E 2023-02-18 NaN Single_day

答案2

得分: 1

我将为您提供代码部分的中文翻译:

# 使用 `groupby.apply` 方法创建自定义函数
def pattern(g):
    # 计算日期差异
    diff = pd.to_datetime(g['date']).diff().dt.days
    return pd.DataFrame({'day_difference': diff,
                         'Flag': 'Single_day' if len(g) == 1 else
                         (f'{int(diff.iloc[-1])}_days_diff'
                          if len(diff.dropna().unique()) == 1
                          else 'No_pattern')
                        })

# 使用自定义函数并将结果与原始数据框连接
out = df_in.join(df_in.groupby('id', group_keys=False).apply(pattern))

输出:

   id        date  day_difference         Flag
0   A  2023-02-04             NaN  1_days_diff
1   A  2023-02-05             1.0  1_days_diff
2   A  2023-02-06             1.0  1_days_diff
3   B  2023-02-06             NaN  7_days_diff
4   B  2023-02-13             7.0  7_days_diff
5   B  2023-02-20             7.0  7_days_diff
6   C  2023-02-07             NaN   No_pattern
7   C  2023-02-10             3.0   No_pattern
8   C  2023-02-12             2.0   No_pattern
9   D  2023-02-14             NaN  3_days_diff
10  D  2023-02-17             3.0  3_days_diff
11  D  2023-02-20             3.0  3_days_diff
12  E  2023-02-18             NaN   Single_day

请注意,这是代码的中文翻译,其中包括函数和变量名称。如果您有其他疑问,请随时提出。

英文:

I would use a custom function with groupby.apply:

def pattern(g):
    diff = pd.to_datetime(g['date']).diff().dt.days
    return pd.DataFrame({'day_difference': diff,
                         'Flag': 'Single_day' if len(g)==1 else
                         (f'{int(diff.iloc[-1])}_days_diff'
                          if len(diff.dropna().unique())==1
                          else 'No_pattern')
                        })

out = df_in.join(df_in.groupby('id', group_keys=False).apply(pattern))

Output:

   id        date  day_difference         Flag
0   A  2023-02-04             NaN  1_days_diff
1   A  2023-02-05             1.0  1_days_diff
2   A  2023-02-06             1.0  1_days_diff
3   B  2023-02-06             NaN  7_days_diff
4   B  2023-02-13             7.0  7_days_diff
5   B  2023-02-20             7.0  7_days_diff
6   C  2023-02-07             NaN   No_pattern
7   C  2023-02-10             3.0   No_pattern
8   C  2023-02-12             2.0   No_pattern
9   D  2023-02-14             NaN  3_days_diff
10  D  2023-02-17             3.0  3_days_diff
11  D  2023-02-20             3.0  3_days_diff
12  E  2023-02-18             NaN   Single_day

huangapple
  • 本文由 发表于 2023年3月7日 17:39:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定