2023年5月25日 02:42:53go评论66阅读模式

英文:

Julia: Return Minimum Date in DataFrame

问题

The question is fairly simple. How do I return the minimum purchase date for each customer using Tidier?

#params
f = "path"

df = CSV.File(f) |> DataFrame
df = @chain df begin
    @select(SHOPIFY_ORDER_ID, CUSTOMER_ID, SHIPMONTH, GROSS_REVENUE, Country)
    @rename(order_id = SHOPIFY_ORDER_ID,
            customer_id = CUSTOMER_ID,
            date = SHIPMONTH,
            revenue = GROSS_REVENUE,
            country = Country)
    @filter(country != "CA")
    @filter(!ismissing(date))
    @filter(revenue != 0.0)
end 


# logic to calculate summary stats
df_sum = @chain df begin
    @group_by(customer_id)
    @mutate(
        cohort = min(date)
    )
end

min(df[!, :date])

For df_sum I receive the following error:

ERROR: ArgumentError: argument is not a permutation Stacktrace:  [1]
invperm(a::Vector{Int64})    @ Base .\combinatorics.jl:282  [2]
groupby(df::DataFrame, cols::Cols{Tuple{Symbol}}; sort::Bool, skipmissing::Bool)    @ DataFrames
C:path\\.julia\packages\DataFrames\LteEl\src\groupeddataframe\groupeddataframe.jl:264
[3] top-level scope    @ path.jl:453

When attempting to identify the min date in the data.frame I receive the error:

ERROR: MethodError: no method matching min(::Vector{Union{Missing, Dates.Date}})
Closest candidates are:
  min(::Any, ::Missing)    @ Base missing.jl:134
  min(::Any, ::Any)    @ Base operators.jl:481  
  min(::Any, ::Any, ::Any, ::Any...)    @ Base operators.jl:578   ...

Stacktrace:  [1] top-level scope    @ c:\path\script.jl:28

Which indicates to me that min doesn't work where there is a Missing data type, but I'm not sure how to solve from there.

英文:

The question is fairly simple. How do I return the minimum purchase date for each customer using Tidier?

using Tidier, DataFrames, Plots, CSV


#params
f = &quot;path&quot;

df = CSV.File(f) |&gt; DataFrame
df = @chain df begin
    @select(SHOPIFY_ORDER_ID, CUSTOMER_ID, SHIPMONTH, GROSS_REVENUE, Country)
    @rename(order_id = SHOPIFY_ORDER_ID,
            customer_id = CUSTOMER_ID,
            date = SHIPMONTH,
            revenue = GROSS_REVENUE,
            country = Country)
    @filter(country != &quot;CA&quot;)
    @filter(!ismissing(date))
    @filter(revenue != 0.0)
end 


# logic to calculate summary stats
df_sum = @chain df begin
    @group_by(customer_id)
    @mutate(
        cohort = min(date)
    )
end

min(df[!, :date])

for df_sum I receive the following error:

> ERROR: ArgumentError: argument is not a permutation Stacktrace: [1]
> invperm(a::Vector{Int64}) @ Base .\combinatorics.jl:282 [2]
> groupby(df::DataFrame, cols::Cols{Tuple{Symbol}}; sort::Bool,
> skipmissing::Bool) @ DataFrames
> C:path\.julia\packages\DataFrames\LteEl\src\groupeddataframe\groupeddataframe.jl:264
> [3] top-level scope @
> path.jl:453

When attemtping to identify the min date in the data.frame I receive the error:

> ERROR: MethodError: no method matching min(::Vector{Union{Missing,
> Dates.Date}})
>
> Closest candidates are: min(::Any, ::Missing) @ Base
> missing.jl:134 min(::Any, ::Any) @ Base operators.jl:481
> min(::Any, ::Any, ::Any, ::Any...) @ Base operators.jl:578 ...
>
> Stacktrace: [1] top-level scope @
> c:\path\script.jl:28

Which indicates to me that min doesn't work where there is a Missing data type, but I'm not sure how to solve from there.

答案1

得分: 1

你可能需要使用 minimum 而不是 min。我看不到你的数据。如果你有缺失值，那么 minimum 应该仍然可以正常工作，但如果你想要 maximum，你需要先使用 skipmissing。

英文:

You probably need to use minimum instead of min. I do not see your data. If you have missing values then minimum should still just work, but if you wanted maximum you would need to skipmissing first.

答案2

得分: 1

Elaborating on Bogumił Kamiński's answer, you could try the following code:

df_sum = @chain df begin
    @group_by(customer_id)
    @mutate(
        minimum_date = minimum(skipmissing(date))
    )
end

The other thing to consider is whether you want to add a column to your existing dataset, or whether you simply want to return the minimum date only for each customer.

Here are two alternative approaches:

The first one will return only the customer_id and the minimum date for each customer.

df_sum = @chain df begin
    @group_by(customer_id)
    @summarize(
        minimum_date = minimum(skipmissing(date))
    )
end

In case you want to return the whole row, here's the second approach:

df_sum = @chain df begin
    @group_by(customer_id)
    @filter(
        date == minimum(skipmissing(date))
    )
    @ungroup
end

Without having access to the original dataset, it's hard to confirm if these will work for you. If these don't work, please let us know!

英文:

Elaborating on Bogumił Kamiński’s answer, you could try the following code:

df_sum = @chain df begin
    @group_by(customer_id)
    @mutate(
        minimum_date = minimum(skipmissing(date))
    )
end

The other thing to consider is whether you want to add a column to your existing dataset, or whether you simply want to return the minimum date only for each customer.

Here are two alternative approaches:

The first one will return only the customer_id and the minimum date for each customer.

df_sum = @chain df begin
    @group_by(customer_id)
    @summarize(
        minimum_date = minimum(skipmissing(date))
    )
end

In case you want to return the whole row, here’s the second approach:

df_sum = @chain df begin
    @group_by(customer_id)
    @filter(
        date == minimum(skipmissing(date))
    )
    @ungroup
end

Without having access to the original dataset, it’s hard to confirm if these will work for you. If these don’t work, please let us know!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

返回DataFrame中的最小日期。

问题

答案1

答案2

获取Julia中容器类型而不是元素类型的惯用方法是什么？

如何在Julia中创建具有时间相关性的噪声？

如何在IO上设置带超时的读取

在Julia中的BigFloat

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论