如何获取月份的差异以复制Excel的行为

huangapple go评论53阅读模式
英文:

How to get difference of months replicating Excel behavior

问题

以下是翻译好的部分:

我想要**减去**两个日期列并获取它们的**差异******为单位而不是天但一直没有成功

数据
```python
import polars as pl
from datetime import datetime

test_df = pl.DataFrame(
    {
        "dt_end": [datetime(2022, 1, 1).date(), datetime(2022, 1, 2).date()],
        "dt_start": [datetime(2021, 5, 7).date(), datetime(2020, 7, 8).date()],
    }
)

这会给出天数的差异:

test_df.with_columns(
    diff_months = pl.col('dt_end') - pl.col('dt_start')
)

输出:

dt_end      dt_start    diff_months
date        date        duration[ms]
2022-01-01  2021-05-07  239d
2022-01-02  2020-07-08  543d

我尝试了下面的代码来获取月份,但是不起作用

test_df.with_columns(
    diff_months = pl.duration(months = pl.col('dt_end') - pl.col('dt_start'))
)

更新

我还尝试了下面的代码,但它也给我提供了与Excel不同的值。

test_df.with_columns(
    diff_months = (pl.col('dt_end').dt.year() - pl.col('dt_start').dt.year()) * 12 +
                    (pl.col('dt_end').dt.month().cast(pl.Int32) - pl.col('dt_start').dt.month().cast(pl.Int32)) 
)

输出:

dt_end      dt_start    diff_months
date        date        i32
2022-01-01  2021-05-07  8
2022-01-02  2020-07-08  18

更新 2:

我认为这是以30.42作为每月平均天数的手动方法:

test_df.with_columns(
    diff_months = pl.col('dt_end') - pl.col('dt_start')
).with_columns(
    diff_months = (pl.col('diff_months').dt.days()/30.42).floor().cast(pl.Int8)
)

输出

dt_end      dt_start    diff_months
date        date        i8
2022-01-01  2021-05-07  7
2022-01-02  2020-07-08  17

更新 3:

实际数据的快照,带有期望的 Excel 值参考:

# 数据
snapshot_df = pl.DataFrame(
    {
        "Reporting Month": [datetime(2000, 7, 20).date(), datetime(2000, 8, 20).date(), datetime(2000, 9, 20).date()],
        "Origination date": [datetime(1999, 12, 19).date(), datetime(1999, 12, 19).date(), datetime(1999, 12, 19).date()],
        "Excel_Reference_Age": [7,8,9]
    }
)

比较

snapshot_df.with_columns(
    diff_months = pl.col('Reporting Month') - pl.col('Origination date')
).with_columns(
    diff_months = (pl.col('diff_months').dt.days()/30.42).floor().cast(pl.Int16),

    diff_months_1 = (pl.col('Reporting Month').dt.year() - pl.col('Origination date').dt.year()) * 12 +
                    (pl.col('Reporting Month').dt.month().cast(pl.Int32) - pl.col('Origination date').dt.month().cast(pl.Int32)) ,

    diff_months_2 = pl.date_range(pl.col("Origination date"), pl.col("Reporting Month"), "1mo").list.lengths()
).select('Excel_Reference_Age','diff_months','diff_months_1','diff_months_2)

输出

Excel_Reference_Age    diff_months    diff_months_1    diff_months_2
i64                    i16            i32              u32
7                      7              7                8
8                      8              8                9
9                      9              9                10

<details>
<summary>英文:</summary>

I would like to **subtract** `two date columns` and get their **difference** in **months** `unit` instead of days but haven&#39;t been able to.

Data:

import polars as pl
from datetime import datetime

test_df = pl.DataFrame(
{
"dt_end": [datetime(2022, 1, 1).date(), datetime(2022, 1, 2).date()],
"dt_start": [datetime(2021, 5, 7).date(), datetime(2020, 7, 8).date()],
}
)


This gives difference in Days

test_df.with_columns(
diff_months = pl.col('dt_end') - pl.col('dt_start')
)


output:

dt_end dt_start diff_months
date date duration[ms]
2022-01-01 2021-05-07 239d
2022-01-02 2020-07-08 543d


I have tried below code for **months** but that **doesn&#39;t work**

test_df.with_columns(
diff_months = pl.duration(months = pl.col('dt_end') - pl.col('dt_start'))
)


**UPDATE**

Have also tried below code but that also gives me a different value from the excel.

test_df.with_columns(
diff_months = (pl.col('dt_end').dt.year() - pl.col('dt_start').dt.year()) * 12 +
(pl.col('dt_end').dt.month().cast(pl.Int32) - pl.col('dt_start').dt.month().cast(pl.Int32))
)

output:

dt_end dt_start diff_months
date date i32
2022-01-01 2021-05-07 8
2022-01-02 2020-07-08 18


**UPDATE 2:**

I think this is the **manual** way of doing it by using **30.42** as the **avg** number of days in month

test_df.with_columns(
diff_months = pl.col('dt_end') - pl.col('dt_start')
).with_columns(
diff_months = (pl.col('diff_months').dt.days()/30.42).floor().cast(pl.Int8)
)

output

dt_end dt_start diff_months
date date i8
2022-01-01 2021-05-07 7
2022-01-02 2020-07-08 17


**UPDATE 3:** Snapshot of actual data with desired Reference excel value

Data

snapshot_df = pl.DataFrame(
{
"Reporting Month": [datetime(2000, 7, 20).date(), datetime(2000, 8, 20).date(), datetime(2000, 9, 20).date()],
"Origination date": [datetime(1999, 12, 19).date(), datetime(1999, 12, 19).date(), datetime(1999, 12, 19).date()],
"Excel_Reference_Age": [7,8,9]
}
)


comparison

snapshot_df.with_columns(
diff_months = pl.col('Reporting Month') - pl.col('Origination date')
).with_columns(
diff_months = (pl.col('diff_months').dt.days()/30.42).floor().cast(pl.Int16),

diff_months_1 = (pl.col(&#39;Reporting Month&#39;).dt.year() - pl.col(&#39;Origination date&#39;).dt.year()) * 12 +
                (pl.col(&#39;Reporting Month&#39;).dt.month().cast(pl.Int32) - pl.col(&#39;Origination date&#39;).dt.month().cast(pl.Int32)) ,

diff_months_2 = pl.date_range(pl.col(&quot;Origination date&quot;), pl.col(&quot;Reporting Month&quot;), &quot;1mo&quot;).list.lengths()

).select('Excel_Reference_Age','diff_months','diff_months_1','diff_months_2')


output

Excel_Reference_Age diff_months diff_months_1 diff_months_2
i64 i16 i32 u32
7 7 7 8
8 8 8 9
9 9 9 10


</details>


# 答案1
**得分**: 1

你的更新#1没问题,只需在`dt.days`对于`end`尚未达到`start`的情况下也进行+1调整:

```python
test_df.with_columns(
    diff_months=12 * (pl.col('dt_end').dt.year() - pl.col('dt_start').dt.year())
    + (
        pl.col('dt_end').dt.month().cast(pl.Int32)
        - pl.col('dt_start').dt.month().cast(pl.Int32)
    )
    - (pl.col('dt_end').dt.day() < pl.col('dt_start').dt.day()).cast(pl.Int32)
)
英文:

Your update #1 is just fine, you just also need to adjust by 1 if dt.days for end has not reached start:

test_df.with_columns(
    diff_months=12 * (pl.col(&#39;dt_end&#39;).dt.year() - pl.col(&#39;dt_start&#39;).dt.year())
    + (
        pl.col(&#39;dt_end&#39;).dt.month().cast(pl.Int32)
        - pl.col(&#39;dt_start&#39;).dt.month().cast(pl.Int32)
    )
    - (pl.col(&#39;dt_end&#39;).dt.day() &lt; pl.col(&#39;dt_start&#39;).dt.day()).cast(pl.Int32)
)

huangapple
  • 本文由 发表于 2023年7月13日 17:17:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76677780.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定