如何根据日期条件在数据集中添加行?

huangapple go评论72阅读模式
英文:

How to add rows on a dataset based on a date condition?

问题

我有一个数据集的问题,我想在PowerBI报告上显示它,但在一些记录中,它有一个不同日期的开始时间和结束时间,这使得我难以按每天显示。

我想要自动分割每天的记录。
例如,我有以下记录:

Date_Start Date_End
18/04/2023 10:53:00 a. m. 20/04/2023 03:51:00 a. m.

由于我正在使用Date_Start列来按每天创建报告,所以使用这个记录,我无法显示日期19/04/2022,因为我没有Date_Start列中的那个日期的记录。
因此,我想处理这个记录并将其分割如下:

Date_Start Date_End
18/04/2023 10:53:00 a. m. 18/04/2023 11:59:59 p. m.
19/04/2023 00:00:00 a. m. 19/04/2023 11:59:59 p. m.
20/04/2023 00:00:00 a. m. 20/04/2023 03:51:00 a. m.

我不确定是否可以使用PowerQuery,或者可能使用Python中的Pandas或Numpy库来实现这个。您可以帮助我处理这个问题吗?我会非常感激。谢谢! 如何根据日期条件在数据集中添加行?

英文:

I'm having problems with a dataset I want to use to display on a PowerBI report but in some registers it has a start time and an end time with a different date, which makes me hard to display on a daily basis.

I want to divide the register automatically for each day.
I have the following register, for example:

Date_Start Date_End
18/04/2023 10:53:00 a. m. 20/04/2023 03:51:00 a. m.

Since I am using Date_Start column to create the report on a daily basis, with this register, I can't display the date 19/04/2022, since I don't have a register on Date_Start with that date.
So, I want to process the register and divide it like this:

Date_Start Date_End
18/04/2023 10:53:00 a. m. 18/04/2023 11:59:59 p. m.
19/04/2023 00:00:00 a. m. 19/04/2023 11:59:59 p. m.
20/04/2023 00:00:00 a. m. 20/04/2023 03:51:00 a. m.

I am not sure if this is possible using PowerQuery or maybe using Python among Pandas or Numpy library.

Can you support me with this topic?, I'd appreciate it.
Thanks! 如何根据日期条件在数据集中添加行?

答案1

得分: 2

Here is the translated code:

使用Python / [tag:pandas]您可以尝试以下代码

    df = df.apply(pd.to_datetime, dayfirst=True)
    
    dts = [pd.date_range(s.date(), e.date(), freq="D")
           for s, e in zip(df["Date_Start"], df["Date_End"])]
    
    starts, ends = zip(
        *[
            (
                df.iloc[i, 0] if j == 0 else ts,
                df.iloc[i, 1] if j == len(dt) - 1 else ts.replace(
                    hour=23, minute=59, second=59) 
            )
            for i, dt in enumerate(dts)
            for j, ts in enumerate(dt)
        ]
    )

    out = (
        pd.DataFrame([[starts, ends]])
            .explode([0, 1]).explode([0, 1], ignore_index=True)
            .apply(lambda x: x.dt.strftime("%d/%m/%Y %I:%M:%S %p")
            .str.replace("([AP])M", lambda m: f"{m.group(1).lower()}. m.", regex=True))
            .set_axis(df.columns, axis=1)
    )

**注意**在[*12小时制*][1]中没有`00:00:00 am`,而是`12:00:00 am`(*一天的开始*)。

输出

    print(out)
    
                      Date_Start                   Date_End
    0  18/04/2023 10:53:00 a. m.  18/04/2023 11:59:59 p. m.
    1  19/04/2023 12:00:00 a. m.  19/04/2023 11:59:59 p. m.
    2  20/04/2023 12:00:00 a. m.  20/04/2023 03:51:00 a. m.

*使用的输入*

    #pip install pandas
    import pandas as pd
    
    df = pd.DataFrame(
        {"Date_Start": ["18/04/2023 10:53:00 a. m."],
         "Date_End": ["20/04/2023 03:51:00 a. m."]}
    )

  [1]: https://en.wikipedia.org/wiki/12-hour_clock

(Note: The code has been translated, but the tags and links remain in English.)

英文:

With Python / [tag:pandas], you can try this :

df = df.apply(pd.to_datetime, dayfirst=True)
dts = [pd.date_range(s.date(), e.date(), freq="D")
for s, e in zip(df["Date_Start"], df["Date_End"])]
starts, ends = zip(
*[
(
df.iloc[i, 0] if j == 0 else ts,
df.iloc[i, 1] if j == len(dt) - 1 else ts.replace(
hour=23, minute=59, second=59) 
)
for i, dt in enumerate(dts)
for j, ts in enumerate(dt)
]
)
out = (
pd.DataFrame([[starts, ends]])
.explode([0, 1]).explode([0, 1], ignore_index=True)
.apply(lambda x: x.dt.strftime("%d/%m/%Y %I:%M:%S %p")
.str.replace("([AP])M", lambda m: f"{m.group(1).lower()}. m.", regex=True))
.set_axis(df.columns, axis=1)
)

NB: In a 12-hour clock, there is no 00:00:00 am but 12:00:00 am (start of the day).

Output :

print(out)
Date_Start                   Date_End
0  18/04/2023 10:53:00 a. m.  18/04/2023 11:59:59 p. m.
1  19/04/2023 12:00:00 a. m.  19/04/2023 11:59:59 p. m.
2  20/04/2023 12:00:00 a. m.  20/04/2023 03:51:00 a. m.

Input used :

#pip install pandas
import pandas as pd
df = pd.DataFrame(
{"Date_Start": ["18/04/2023 10:53:00 a. m."],
"Date_End": ["20/04/2023 03:51:00 a. m."]}
)

答案2

得分: 1

以下是翻译好的部分:


let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
# "Added Custom" = Table.AddColumn(Source, "Custom", each List.Transform({Number.IntegerDivide(Number.From([Date_Start]), 1)..Number.IntegerDivide(Number.From([Date_End]), 1)}, each Text.From(Date.From(_)))),
# "Expanded Custom" = Table.ExpandListColumn(# "Added Custom", "Custom"),
# "Added Index" = Table.AddIndexColumn(# "Expanded Custom", "Index", 0, 1, Int64.Type),
# "Added Custom1" = Table.AddColumn(# "Added Index", "Date_Start.", each if [Index]=0 then [Custom] & " " & Text.From(DateTime.Time([Date_Start])) else [Custom] & " " & "00:00 AM"),
# "Added Custom2" = Table.AddColumn(# "Added Custom1", "Date_End.", each if [Index]=List.Max(# "Added Index"[Index]) then [Custom] & " " & Text.From(DateTime.Time([Date_End])) else [Custom] & " " & "11:59 PM"),
# "Removed Columns" = Table.RemoveColumns(# "Added Custom2",{"Date_Start", "Date_End", "Custom", "Index"}),
# "Changed Type" = Table.TransformColumnTypes(# "Removed Columns",{{"Date_Start.", type datetime}, {"Date_End.", type datetime}})
in  # "Changed Type"

如何根据日期条件在数据集中添加行?

英文:

Powerquery method

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each List.Transform({Number.IntegerDivide(Number.From([Date_Start]), 1)..Number.IntegerDivide(Number.From([Date_End]), 1)}, each Text.From(Date.From(_)))),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
#"Added Index" = Table.AddIndexColumn(#"Expanded Custom", "Index", 0, 1, Int64.Type),
#"Added Custom1" = Table.AddColumn(#"Added Index", "Date_Start.", each if [Index]=0 then [Custom] & " " & Text.From(DateTime.Time([Date_Start])) else [Custom] & " " &"00:00 AM"),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Date_End.", each if [Index]=List.Max(#"Added Index"[Index]) then [Custom] & " " & Text.From(DateTime.Time([Date_End])) else [Custom] & " " &"11:59 PM"),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Date_Start", "Date_End", "Custom", "Index"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Date_Start.", type datetime}, {"Date_End.", type datetime}})
in  #"Changed Type"

如何根据日期条件在数据集中添加行?

huangapple
  • 本文由 发表于 2023年6月8日 04:28:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76426902.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定