从GroupedDataFrame中删除或添加键。

huangapple go评论64阅读模式
英文:

Julia: Remove or add key from GroupedDataFrame

问题

Sure, here are the translated parts:

Case 1

df = DataFrame(rand(160, 3), :auto)
rename!(df, [:A, :B, :Z])

@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)

# I group here by A and B
gd = groupby(df, [:A, :B])

#=
	我根据A和B对df执行分组操作。
	...但现在我只需要基于B执行操作。
=#

How to remove key A?

gd.removegroup([:A])
gd.removekey([:A])
gd.ungroup([:A])

Case 2

df = DataFrame(rand(160, 3), :auto)
rename!(df, [:A, :B, :Z])

@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)

# I group here by B
gd = groupby(df, [:B])

#=
	我根据B对df执行分组操作。
	...但现在我需要基于B和A执行操作。
=#

How to add key A?

groupby(gd, [:A]) ❌❌❌❌
gd.addkey([:A])
gd.addgroup([:A])

Please note that the ❌❌❌❌ indicates that the specific line is not a valid Julia code for adding key A.

英文:

Imagine I have a df on which I need to perform operations based on grouped columns. But I need two perform actions based on two groupings.

Having cols A, B, C I need to do operation x to df grouped by A, B and operation y to df grouped only by B. Do I need to group the dataframe twice?

Case 1

df=DataFrame(rand(160,3), :auto)
rename!(df,[:A,:B,:Z])

@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)

# I group here by A and B
gd = groupby(df, [:A, :B])

#=
	My operations with df grouped by A and B.
	... But now I need to perform only with B
=#

How to remove key A?

gd.removegroup([:A])
gd.removekey([:A])
gd.ungroup([:A])

Case 2

df=DataFrame(rand(160,3), :auto)
rename!(df,[:A,:B,:Z])

@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)

# I group here by B
gd = groupby(df, [:B])

#=
	My operations with df grouped by B.
	... But now I need to perform with B and A
=#

How to add key A?

groupby(gd, [:A]) ❌❌❌❌
gd.addkey([:A])
gd.addgroup([:A])

答案1

得分: 0

Sure, here's the translation of the provided text:

"我需要两次分组数据框吗?

需要。两次分组与添加/删除分组一样编码量相同。只需执行以下操作:

gd1 = groupby(df, [:A, :B])
gd2 = groupby(df, :B)

由于分组的数据框是源 df 的视图,如果您更改 gd1,更改将自动反映在 gd2 中。

您唯一需要记住的是,在更改 df 时,不应更改列 :A:B,因为更改分组列可能会使分组无效。"

英文:

> Do I need to group the dataframe twice?

Yes. Grouping twice is the same amount of coding as adding/removing group. Just do e.g.:

gd1 = groupby(df, [:A, :B])
gd2 = groupby(df, :B)

Since grouped data frame is a view of source df, if you mutate gd1 the changes will be reflected in gd2 automatically.

The only thing you need to keep in mind that when mutating df you should not mutate columns :A and :B as mutating grouping columns could invalidate groupings.

huangapple
  • 本文由 发表于 2023年5月13日 13:38:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241249.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定