英文:
How to add rows on a dataset based on a date condition?
问题
我有一个数据集的问题,我想在PowerBI报告上显示它,但在一些记录中,它有一个不同日期的开始时间和结束时间,这使得我难以按每天显示。
我想要自动分割每天的记录。
例如,我有以下记录:
Date_Start | Date_End |
---|---|
18/04/2023 10:53:00 a. m. | 20/04/2023 03:51:00 a. m. |
由于我正在使用Date_Start
列来按每天创建报告,所以使用这个记录,我无法显示日期19/04/2022,因为我没有Date_Start
列中的那个日期的记录。
因此,我想处理这个记录并将其分割如下:
Date_Start | Date_End |
---|---|
18/04/2023 10:53:00 a. m. | 18/04/2023 11:59:59 p. m. |
19/04/2023 00:00:00 a. m. | 19/04/2023 11:59:59 p. m. |
20/04/2023 00:00:00 a. m. | 20/04/2023 03:51:00 a. m. |
我不确定是否可以使用PowerQuery,或者可能使用Python中的Pandas或Numpy库来实现这个。您可以帮助我处理这个问题吗?我会非常感激。谢谢!
英文:
I'm having problems with a dataset I want to use to display on a PowerBI report but in some registers it has a start time and an end time with a different date, which makes me hard to display on a daily basis.
I want to divide the register automatically for each day.
I have the following register, for example:
Date_Start | Date_End |
---|---|
18/04/2023 10:53:00 a. m. | 20/04/2023 03:51:00 a. m. |
Since I am using Date_Start
column to create the report on a daily basis, with this register, I can't display the date 19/04/2022, since I don't have a register on Date_Start
with that date.
So, I want to process the register and divide it like this:
Date_Start | Date_End |
---|---|
18/04/2023 10:53:00 a. m. | 18/04/2023 11:59:59 p. m. |
19/04/2023 00:00:00 a. m. | 19/04/2023 11:59:59 p. m. |
20/04/2023 00:00:00 a. m. | 20/04/2023 03:51:00 a. m. |
I am not sure if this is possible using PowerQuery or maybe using Python among Pandas or Numpy library.
Can you support me with this topic?, I'd appreciate it.
Thanks!
答案1
得分: 2
Here is the translated code:
使用Python / [tag:pandas],您可以尝试以下代码:
df = df.apply(pd.to_datetime, dayfirst=True)
dts = [pd.date_range(s.date(), e.date(), freq="D")
for s, e in zip(df["Date_Start"], df["Date_End"])]
starts, ends = zip(
*[
(
df.iloc[i, 0] if j == 0 else ts,
df.iloc[i, 1] if j == len(dt) - 1 else ts.replace(
hour=23, minute=59, second=59)
)
for i, dt in enumerate(dts)
for j, ts in enumerate(dt)
]
)
out = (
pd.DataFrame([[starts, ends]])
.explode([0, 1]).explode([0, 1], ignore_index=True)
.apply(lambda x: x.dt.strftime("%d/%m/%Y %I:%M:%S %p")
.str.replace("([AP])M", lambda m: f"{m.group(1).lower()}. m.", regex=True))
.set_axis(df.columns, axis=1)
)
**注意**:在[*12小时制*][1]中,没有`00:00:00 am`,而是`12:00:00 am`(*一天的开始*)。
输出:
print(out)
Date_Start Date_End
0 18/04/2023 10:53:00 a. m. 18/04/2023 11:59:59 p. m.
1 19/04/2023 12:00:00 a. m. 19/04/2023 11:59:59 p. m.
2 20/04/2023 12:00:00 a. m. 20/04/2023 03:51:00 a. m.
*使用的输入:*
#pip install pandas
import pandas as pd
df = pd.DataFrame(
{"Date_Start": ["18/04/2023 10:53:00 a. m."],
"Date_End": ["20/04/2023 03:51:00 a. m."]}
)
[1]: https://en.wikipedia.org/wiki/12-hour_clock
(Note: The code has been translated, but the tags and links remain in English.)
英文:
With Python / [tag:pandas], you can try this :
df = df.apply(pd.to_datetime, dayfirst=True)
dts = [pd.date_range(s.date(), e.date(), freq="D")
for s, e in zip(df["Date_Start"], df["Date_End"])]
starts, ends = zip(
*[
(
df.iloc[i, 0] if j == 0 else ts,
df.iloc[i, 1] if j == len(dt) - 1 else ts.replace(
hour=23, minute=59, second=59)
)
for i, dt in enumerate(dts)
for j, ts in enumerate(dt)
]
)
out = (
pd.DataFrame([[starts, ends]])
.explode([0, 1]).explode([0, 1], ignore_index=True)
.apply(lambda x: x.dt.strftime("%d/%m/%Y %I:%M:%S %p")
.str.replace("([AP])M", lambda m: f"{m.group(1).lower()}. m.", regex=True))
.set_axis(df.columns, axis=1)
)
NB: In a 12-hour clock, there is no 00:00:00 am
but 12:00:00 am
(start of the day).
Output :
print(out)
Date_Start Date_End
0 18/04/2023 10:53:00 a. m. 18/04/2023 11:59:59 p. m.
1 19/04/2023 12:00:00 a. m. 19/04/2023 11:59:59 p. m.
2 20/04/2023 12:00:00 a. m. 20/04/2023 03:51:00 a. m.
Input used :
#pip install pandas
import pandas as pd
df = pd.DataFrame(
{"Date_Start": ["18/04/2023 10:53:00 a. m."],
"Date_End": ["20/04/2023 03:51:00 a. m."]}
)
答案2
得分: 1
以下是翻译好的部分:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
# "Added Custom" = Table.AddColumn(Source, "Custom", each List.Transform({Number.IntegerDivide(Number.From([Date_Start]), 1)..Number.IntegerDivide(Number.From([Date_End]), 1)}, each Text.From(Date.From(_)))),
# "Expanded Custom" = Table.ExpandListColumn(# "Added Custom", "Custom"),
# "Added Index" = Table.AddIndexColumn(# "Expanded Custom", "Index", 0, 1, Int64.Type),
# "Added Custom1" = Table.AddColumn(# "Added Index", "Date_Start.", each if [Index]=0 then [Custom] & " " & Text.From(DateTime.Time([Date_Start])) else [Custom] & " " & "00:00 AM"),
# "Added Custom2" = Table.AddColumn(# "Added Custom1", "Date_End.", each if [Index]=List.Max(# "Added Index"[Index]) then [Custom] & " " & Text.From(DateTime.Time([Date_End])) else [Custom] & " " & "11:59 PM"),
# "Removed Columns" = Table.RemoveColumns(# "Added Custom2",{"Date_Start", "Date_End", "Custom", "Index"}),
# "Changed Type" = Table.TransformColumnTypes(# "Removed Columns",{{"Date_Start.", type datetime}, {"Date_End.", type datetime}})
in # "Changed Type"
英文:
Powerquery method
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each List.Transform({Number.IntegerDivide(Number.From([Date_Start]), 1)..Number.IntegerDivide(Number.From([Date_End]), 1)}, each Text.From(Date.From(_)))),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
#"Added Index" = Table.AddIndexColumn(#"Expanded Custom", "Index", 0, 1, Int64.Type),
#"Added Custom1" = Table.AddColumn(#"Added Index", "Date_Start.", each if [Index]=0 then [Custom] & " " & Text.From(DateTime.Time([Date_Start])) else [Custom] & " " &"00:00 AM"),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Date_End.", each if [Index]=List.Max(#"Added Index"[Index]) then [Custom] & " " & Text.From(DateTime.Time([Date_End])) else [Custom] & " " &"11:59 PM"),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Date_Start", "Date_End", "Custom", "Index"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Date_Start.", type datetime}, {"Date_End.", type datetime}})
in #"Changed Type"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论