条件分数化

huangapple go评论71阅读模式
英文:

Conditional fractioning

问题

ID Country Sales fraction
1 奥地利 6 0.666
1 奥地利 6 0.666
1 比利时 6 0.333
2 比利时 10 0.5
2 捷克 10 0.5
3 丹麦 3 1
3 德国 3 1
英文:

Suppose I have the dataset

ID Country Sales
1 Austria 6
1 Austria 6
1 Belgium 6
2 Belgium 10
2 Czech 10
3 Denmark 3
3 Germany 3

I want to another variable of sales which depends on countries and their ID ie in fractions.

ID Country Sales fraction
1 Austria 6 0.666
1 Austria 6 0.666
1 Belgium 6 0.333
2 Belgium 10 0.5
2 Czech 10 0.5
3 Denmark 3 1
3 Denmark 3 1

Any help would be appreciated!

答案1

得分: 1

library(dplyr)
your_data |>
  mutate(country_total = sum(Sales), .by = c(ID, Country)) |>
  mutate(fraction = country_total / sum(Sales), .by = ID)
#   ID Country Sales country_total  fraction
# 1  1 Austria     6            12 0.6666667
# 2  1 Austria     6            12 0.6666667
# 3  1 Belgium     6             6 0.3333333
# 4  2 Belgium    10            10 0.5000000
# 5  2   Czech    10            10 0.5000000
# 6  3 Denmark     3             3 0.5000000
# 7  3 Germany     3             3 0.5000000

使用此示例数据:

your_data = read.table(text = 'ID 	Country 	Sales
1 	Austria 	6
1 	Austria 	6
1 	Belgium 	6
2 	Belgium 	10
2 	Czech 	10
3 	Denmark 	3
3 	Germany 	3', header = T)
英文:
library(dplyr)
your_data |>
  mutate(country_total = sum(Sales), .by = c(ID, Country)) |>
  mutate(fraction = country_total / sum(Sales), .by = ID)
#   ID Country Sales country_total  fraction
# 1  1 Austria     6            12 0.6666667
# 2  1 Austria     6            12 0.6666667
# 3  1 Belgium     6             6 0.3333333
# 4  2 Belgium    10            10 0.5000000
# 5  2   Czech    10            10 0.5000000
# 6  3 Denmark     3             3 0.5000000
# 7  3 Germany     3             3 0.5000000

Using this sample data:

your_data = read.table(text = 'ID 	Country 	Sales
1 	Austria 	6
1 	Austria 	6
1 	Belgium 	6
2 	Belgium 	10
2 	Czech 	10
3 	Denmark 	3
3 	Germany 	3', header = T)

答案2

得分: 1

Here is the translated content:

"或者,我们可以使用 add_count 来获得相同的结果

library(dplyr)

df %>% add_count(ID,name = 'n') %>% add_count(ID,Country, name = 'gn') %>% 
mutate(new=gn/n) %>% select(-c(n,gn))

# 输出

# A tibble: 7 × 4
     ID Country Sales   new
  <dbl> <chr>   <dbl> <dbl>
1     1 Austria     6 0.667
2     1 Austria     6 0.667
3     1 Belgium     6 0.333
4     2 Belgium    10 0.5  
5     2 Czech      10 0.5  
6     3 Denmark     3 0.5  
7     3 Germany     3 0.5  
```"

<details>
<summary>英文:</summary>

Alternatively we could use `add_count` to get the same result

````r
library(dplyr)

df %&gt;% add_count(ID,name = &#39;n&#39;) %&gt;% add_count(ID,Country, name = &#39;gn&#39;) %&gt;% 
mutate(new=gn/n) %&gt;% select(-c(n,gn))

# output

# A tibble: 7 &#215; 4
     ID Country Sales   new
  &lt;dbl&gt; &lt;chr&gt;   &lt;dbl&gt; &lt;dbl&gt;
1     1 Austria     6 0.667
2     1 Austria     6 0.667
3     1 Belgium     6 0.333
4     2 Belgium    10 0.5  
5     2 Czech      10 0.5  
6     3 Denmark     3 0.5  
7     3 Germany     3 0.5  

答案3

得分: 0

ID Country Sales  fraction
1  1 Austria     6 0.6666667
2  1 Austria     6 0.6666667
3  1 Belgium     6 0.3333333
4  2 Belgium    10 0.5000000
5  2   Czech    10 0.5000000
6  3 Denmark     3 0.5000000
7  3 Germany     3 0.5000000
英文:

Base

&gt; df$fraction=ave(df$Sales,list(df$ID,df$Country),FUN=sum)/ave(df$Sales,df$ID,FUN=sum)

  ID Country Sales  fraction
1  1 Austria     6 0.6666667
2  1 Austria     6 0.6666667
3  1 Belgium     6 0.3333333
4  2 Belgium    10 0.5000000
5  2   Czech    10 0.5000000
6  3 Denmark     3 0.5000000
7  3 Germany     3 0.5000000

huangapple
  • 本文由 发表于 2023年6月29日 21:48:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581662.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定