2023年6月8日 04:28:07go评论107阅读模式

英文:

How to add rows on a dataset based on a date condition?

问题

我有一个数据集的问题，我想在PowerBI报告上显示它，但在一些记录中，它有一个不同日期的开始时间和结束时间，这使得我难以按每天显示。

我想要自动分割每天的记录。
例如，我有以下记录：

Date_Start	Date_End
18/04/2023 10:53:00 a. m.	20/04/2023 03:51:00 a. m.

由于我正在使用Date_Start列来按每天创建报告，所以使用这个记录，我无法显示日期19/04/2022，因为我没有Date_Start列中的那个日期的记录。
因此，我想处理这个记录并将其分割如下：

Date_Start	Date_End
18/04/2023 10:53:00 a. m.	18/04/2023 11:59:59 p. m.
19/04/2023 00:00:00 a. m.	19/04/2023 11:59:59 p. m.
20/04/2023 00:00:00 a. m.	20/04/2023 03:51:00 a. m.

我不确定是否可以使用PowerQuery，或者可能使用Python中的Pandas或Numpy库来实现这个。您可以帮助我处理这个问题吗？我会非常感激。谢谢！

英文:

I'm having problems with a dataset I want to use to display on a PowerBI report but in some registers it has a start time and an end time with a different date, which makes me hard to display on a daily basis.

I want to divide the register automatically for each day.
I have the following register, for example:

Date_Start	Date_End
18/04/2023 10:53:00 a. m.	20/04/2023 03:51:00 a. m.

Since I am using Date_Start column to create the report on a daily basis, with this register, I can't display the date 19/04/2022, since I don't have a register on Date_Start with that date.
So, I want to process the register and divide it like this:

Date_Start	Date_End
18/04/2023 10:53:00 a. m.	18/04/2023 11:59:59 p. m.
19/04/2023 00:00:00 a. m.	19/04/2023 11:59:59 p. m.
20/04/2023 00:00:00 a. m.	20/04/2023 03:51:00 a. m.

I am not sure if this is possible using PowerQuery or maybe using Python among Pandas or Numpy library.

Can you support me with this topic?, I'd appreciate it.
Thanks!

答案1

得分: 2

Here is the translated code:

使用Python / [tag:pandas]，您可以尝试以下代码：
    df = df.apply(pd.to_datetime, dayfirst=True)
    
    dts = [pd.date_range(s.date(), e.date(), freq="D")
           for s, e in zip(df["Date_Start"], df["Date_End"])]
    
    starts, ends = zip(
        *[
            (
                df.iloc[i, 0] if j == 0 else ts,
                df.iloc[i, 1] if j == len(dt) - 1 else ts.replace(
                    hour=23, minute=59, second=59) 
            )
            for i, dt in enumerate(dts)
            for j, ts in enumerate(dt)
        ]
    )
    out = (
        pd.DataFrame([[starts, ends]])
            .explode([0, 1]).explode([0, 1], ignore_index=True)
            .apply(lambda x: x.dt.strftime("%d/%m/%Y %I:%M:%S %p")
            .str.replace("([AP])M", lambda m: f"{m.group(1).lower()}. m.", regex=True))
            .set_axis(df.columns, axis=1)
    )
**注意**：在[*12小时制*][1]中，没有`00:00:00 am`，而是`12:00:00 am`（*一天的开始*）。
输出：
    print(out)
    
                      Date_Start                   Date_End
    0  18/04/2023 10:53:00 a. m.  18/04/2023 11:59:59 p. m.
    1  19/04/2023 12:00:00 a. m.  19/04/2023 11:59:59 p. m.
    2  20/04/2023 12:00:00 a. m.  20/04/2023 03:51:00 a. m.
*使用的输入：*
    #pip install pandas
    import pandas as pd
    
    df = pd.DataFrame(
        {"Date_Start": ["18/04/2023 10:53:00 a. m."],
         "Date_End": ["20/04/2023 03:51:00 a. m."]}
    )
  [1]: https://en.wikipedia.org/wiki/12-hour_clock

(Note: The code has been translated, but the tags and links remain in English.)

英文:

With Python / [tag:pandas], you can try this :

df = df.apply(pd.to_datetime, dayfirst=True)
dts = [pd.date_range(s.date(), e.date(), freq=&quot;D&quot;)
for s, e in zip(df[&quot;Date_Start&quot;], df[&quot;Date_End&quot;])]
starts, ends = zip(
*[
(
df.iloc[i, 0] if j == 0 else ts,
df.iloc[i, 1] if j == len(dt) - 1 else ts.replace(
hour=23, minute=59, second=59) 
)
for i, dt in enumerate(dts)
for j, ts in enumerate(dt)
]
)
out = (
pd.DataFrame([[starts, ends]])
.explode([0, 1]).explode([0, 1], ignore_index=True)
.apply(lambda x: x.dt.strftime(&quot;%d/%m/%Y %I:%M:%S %p&quot;)
.str.replace(&quot;([AP])M&quot;, lambda m: f&quot;{m.group(1).lower()}. m.&quot;, regex=True))
.set_axis(df.columns, axis=1)
)

NB: In a 12-hour clock, there is no 00:00:00 am but 12:00:00 am (start of the day).

Output :

print(out)
Date_Start                   Date_End
0  18/04/2023 10:53:00 a. m.  18/04/2023 11:59:59 p. m.
1  19/04/2023 12:00:00 a. m.  19/04/2023 11:59:59 p. m.
2  20/04/2023 12:00:00 a. m.  20/04/2023 03:51:00 a. m.

Input used :

#pip install pandas
import pandas as pd
df = pd.DataFrame(
{&quot;Date_Start&quot;: [&quot;18/04/2023 10:53:00 a. m.&quot;],
&quot;Date_End&quot;: [&quot;20/04/2023 03:51:00 a. m.&quot;]}
)

答案2

得分: 1

以下是翻译好的部分：


let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
# "Added Custom" = Table.AddColumn(Source, "Custom", each List.Transform({Number.IntegerDivide(Number.From([Date_Start]), 1)..Number.IntegerDivide(Number.From([Date_End]), 1)}, each Text.From(Date.From(_)))),
# "Expanded Custom" = Table.ExpandListColumn(# "Added Custom", "Custom"),
# "Added Index" = Table.AddIndexColumn(# "Expanded Custom", "Index", 0, 1, Int64.Type),
# "Added Custom1" = Table.AddColumn(# "Added Index", "Date_Start.", each if [Index]=0 then [Custom] & " " & Text.From(DateTime.Time([Date_Start])) else [Custom] & " " & "00:00 AM"),
# "Added Custom2" = Table.AddColumn(# "Added Custom1", "Date_End.", each if [Index]=List.Max(# "Added Index"[Index]) then [Custom] & " " & Text.From(DateTime.Time([Date_End])) else [Custom] & " " & "11:59 PM"),
# "Removed Columns" = Table.RemoveColumns(# "Added Custom2",{"Date_Start", "Date_End", "Custom", "Index"}),
# "Changed Type" = Table.TransformColumnTypes(# "Removed Columns",{{"Date_Start.", type datetime}, {"Date_End.", type datetime}})
in  # "Changed Type"

英文:

Powerquery method

let Source = Excel.CurrentWorkbook(){[Name=&quot;Table1&quot;]}[Content],
#&quot;Added Custom&quot; = Table.AddColumn(Source, &quot;Custom&quot;, each List.Transform({Number.IntegerDivide(Number.From([Date_Start]), 1)..Number.IntegerDivide(Number.From([Date_End]), 1)}, each Text.From(Date.From(_)))),
#&quot;Expanded Custom&quot; = Table.ExpandListColumn(#&quot;Added Custom&quot;, &quot;Custom&quot;),
#&quot;Added Index&quot; = Table.AddIndexColumn(#&quot;Expanded Custom&quot;, &quot;Index&quot;, 0, 1, Int64.Type),
#&quot;Added Custom1&quot; = Table.AddColumn(#&quot;Added Index&quot;, &quot;Date_Start.&quot;, each if [Index]=0 then [Custom] &amp; &quot; &quot; &amp; Text.From(DateTime.Time([Date_Start])) else [Custom] &amp; &quot; &quot; &amp;&quot;00:00 AM&quot;),
#&quot;Added Custom2&quot; = Table.AddColumn(#&quot;Added Custom1&quot;, &quot;Date_End.&quot;, each if [Index]=List.Max(#&quot;Added Index&quot;[Index]) then [Custom] &amp; &quot; &quot; &amp; Text.From(DateTime.Time([Date_End])) else [Custom] &amp; &quot; &quot; &amp;&quot;11:59 PM&quot;),
#&quot;Removed Columns&quot; = Table.RemoveColumns(#&quot;Added Custom2&quot;,{&quot;Date_Start&quot;, &quot;Date_End&quot;, &quot;Custom&quot;, &quot;Index&quot;}),
#&quot;Changed Type&quot; = Table.TransformColumnTypes(#&quot;Removed Columns&quot;,{{&quot;Date_Start.&quot;, type datetime}, {&quot;Date_End.&quot;, type datetime}})
in  #&quot;Changed Type&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据日期条件在数据集中添加行？

问题

答案1

答案2

从一个单词中提取字符串中的数字

Tkinter单选按钮实现

Django: TypeError: XXX() got multiple values for argument ‘chat_id’

Element wise 或列求和三元组的总和

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。