在Julia中,使用DataFrames在多个条目上分享一个值。

huangapple go评论59阅读模式
英文:

Share out a value on multiple entries with DataFrames in Julia

问题

如何根据另一列中的国家数量和组合来共享列的值?

英文:

I'm new with the Julia programming language and I would like to group scores by country in a DataFrame like :

 Row │ Name       Score    Country       
     │ String15   Float64  String15   
─────┼────────────────────────────────
   1 │ Oliver         5.0  France
   2 │ Patrick        3.0  Spain
   3 │ Jules          2.0  France
   4 │ Steven         3.5  USA
   5 │ Karl           4.0  France
   6 │ Alexander      3.0  France/USA
   7 │ Julian         1.0  Spain/USA

I have grouped my data by Country with

combine(groupby(db_test, :Country), :Score=>sum)

and I get :

 Row │ Country     Score_sum 
     │ String15    Float64   
─────┼───────────────────────
   1 │ France           11.0
   2 │ Spain             3.0
   3 │ USA               3.5
   4 │ France/USA        3.0
   5 │ Spain/USA         1.0

But I would like to share the score of France/USA and Spain/USA to France, Spain and USA to obtain this :

 Row │ Country     Score_sum 
     │ String15    Float64   
─────┼───────────────────────
   1 │ France           12.5
   2 │ Spain             3.5
   3 │ USA               5.5

How can I share the value of a column according to the number and the combination of countries in another column ?

答案1

得分: 1

以下是代码的翻译部分:

julia> using CSV, DataFrames

julia> data = """
score,country
5.0,France
3.0,Spain
2.0,France
3.5,USA
4.0,France
3.0,France/USA
1.0,Spain/USA"""
"score,country\n5.0,France\n3.0,Spain\n2.0,France\n3.5,USA\n4.0,France\n3.0,France/USA\n1.0,Spain/USA"

julia> df = CSV.read(IOBuffer(data), DataFrame)
7×2 DataFrame
 Row │ score    country    
Float64  String15   
─────┼─────────────────────
   15.0  France
   23.0  Spain
   32.0  France
   43.5  USA
   54.0  France
   63.0  France/USA
   71.0  Spain/USA

julia> df.countrys = split.(df.country, "/")
7-element Vector{Vector{SubString{String15}}}:
 ["France"]
 ["Spain"]
 ["France"]
 ["USA"]
 ["France"]
 ["France", "USA"]
 ["Spain", "USA"]

julia> df.scores = df.score ./ length.(df.countrys)
7-element Vector{Float64}:
 5.0
 3.0
 2.0
 3.5
 4.0
 1.5
 0.5

julia> df2 = flatten(df, :countrys)
9×4 DataFrame
 Row │ score    country     countrys   scores  
Float64  String15    SubStrin…  Float64 
─────┼─────────────────────────────────────────
   15.0  France      France         5.0
   23.0  Spain       Spain          3.0
   32.0  France      France         2.0
   43.5  USA         USA            3.5
   54.0  France      France         4.0
   63.0  France/USA  France         1.5
   73.0  France/USA  USA            1.5
   81.0  Spain/USA   Spain          0.5
   91.0  Spain/USA   USA            0.5

julia> combine(groupby(df2, :countrys), :scores=>sum)
3×2 DataFrame
 Row │ countrys   scores_sum 
     │ SubStrin…  Float64
─────┼───────────────────────
   1 │ France           12.5
   2 │ Spain             3.5
   3 │ USA               5.5

这是代码的翻译部分。

英文:

Here is a full code doing this. I do it step-by-step to make it easy to understand what is going on:

julia> using CSV, DataFrames
julia> data = """score,country
5.0,France
3.0,Spain
2.0,France
3.5,USA
4.0,France
3.0,France/USA
1.0,Spain/USA"""
"score,country\n5.0,France\n3.0,Spain\n2.0,France\n3.5,USA\n4.0,France\n3.0,France/USA\n1.0,Spain/USA"
julia> df = CSV.read(IOBuffer(data), DataFrame)
7×2 DataFrame
Row │ score    country    
│ Float64  String15   
─────┼─────────────────────
1 │     5.0  France
2 │     3.0  Spain
3 │     2.0  France
4 │     3.5  USA
5 │     4.0  France
6 │     3.0  France/USA
7 │     1.0  Spain/USA
julia> df.countrys = split.(df.country, "/")
7-element Vector{Vector{SubString{String15}}}:
["France"]
["Spain"]
["France"]
["USA"]
["France"]
["France", "USA"]
["Spain", "USA"]
julia> df.scores = df.score ./ length.(df.countrys)
7-element Vector{Float64}:
5.0
3.0
2.0
3.5
4.0
1.5
0.5
julia> df2 = flatten(df, :countrys)
9×4 DataFrame
Row │ score    country     countrys   scores  
│ Float64  String15    SubStrin…  Float64 
─────┼─────────────────────────────────────────
1 │     5.0  France      France         5.0
2 │     3.0  Spain       Spain          3.0
3 │     2.0  France      France         2.0
4 │     3.5  USA         USA            3.5
5 │     4.0  France      France         4.0
6 │     3.0  France/USA  France         1.5
7 │     3.0  France/USA  USA            1.5
8 │     1.0  Spain/USA   Spain          0.5
9 │     1.0  Spain/USA   USA            0.5
julia> combine(groupby(df2, :countrys), :scores=>sum)
3×2 DataFrame
Row │ countrys   scores_sum 
│ SubStrin…  Float64
─────┼───────────────────────
1 │ France           12.5
2 │ Spain             3.5
3 │ USA               5.5

huangapple
  • 本文由 发表于 2023年5月13日 18:32:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76242257.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定