使用多个条件进行匹配,在R中生成数值。

huangapple go评论158阅读模式
英文:

Using match on multiple criteria to generate value in R

问题

我目前有以下数据格式:

  1. df = data.frame(c(rep("A", 12), rep("B", 12)), rep(1:12, 2), seq(-12, 11))
  2. colnames(df) = c("station", "month", "mean")
  3. df
  4. df_master = data.frame(c(rep("A", 10), rep("B", 10)), rep(c(27:31, 1:5), 2), rep(c(rep(1, 5), rep(2, 5)), 2), rep(seq(-4,5), 2))
  5. colnames(df_master) = c("station", "day", "month", "value")
  6. df_master

事实上,df 是每个站点的月均值,我想要在 df_master 数据集中计算一个新变量,该变量计算每日观察值与每月均值之间的差异。我已经成功地用包括所有数据的总体平均值来实现这一点,但由于每个站点的均值不同,所以我想要创建一个新的站点特定的变量。

我尝试了以下代码来匹配月度数值,但目前还没有考虑跨站点的差异:

  1. df_master$mean = df$mean[match(df_master$month, df$month)]
  2. df_master = df_master %>% mutate(diff = value - mean)

如何进一步处理,以便按站点进行平均值计算?

英文:

I currently have the following data format:

  1. df = data.frame(c(rep("A", 12), rep("B", 12)), rep(1:12, 2), seq(-12, 11))
  2. colnames(df) = c("station", "month", "mean")
  3. df
  4. df_master = data.frame(c(rep("A", 10), rep("B", 10)), rep(c(27:31, 1:5), 2), rep(c(rep(1, 5), rep(2, 5)), 2), rep(seq(-4,5), 2))
  5. colnames(df_master) = c("station", "day", "month", "value")
  6. df_master

Effectively df is a monthly average value for each station and I want to compute a new variable in the df_master data set which computes the difference from the monthly mean for each daily observation. I have managed to do this with an overall average incuding all the data, but since the mean values vary from each station so I would like to make the new variable station specific.

I have tried the following code to match the monthly value, but this currently doesn't account for cross station differences:

  1. df_master$mean = df$mean[match(df_master$month, df$month)]
  2. df_master = df_master %>% mutate(diff = value - mean)

How can I progress this further so that the averages are taken per station?

答案1

得分: 2

以下是翻译好的部分:

如果你将它们转换为data.tables,你可以使用update join添加差异列,将df_masterdf连接在stationmonth的值相等的条件上。

  1. library(data.table)
  2. setDT(df_master)
  3. setDT(df)
  4. df_master[df, on = .(station, month),
  5. diff_monthmean := value - i.mean]
  6. df_master
  7. # station day month value diff_monthmean
  8. # 1: A 27 1 -4 8
  9. # 2: A 28 1 -3 9
  10. # 3: A 29 1 -2 10
  11. # 4: A 30 1 -1 11
  12. # 5: A 31 1 0 12
  13. # 6: A 1 2 1 12
  14. # 7: A 2 2 2 13
  15. # 8: A 3 2 3 14
  16. # 9: A 4 2 4 15
  17. # 10: A 5 2 5 16
  18. # 11: B 27 1 -4 -4
  19. # 12: B 28 1 -3 -3
  20. # 13: B 29 1 -2 -2
  21. # 14: B 30 1 -1 -1
  22. # 15: B 31 1 0 0
  23. # 16: B 1 2 1 0
  24. # 17: B 2 2 2 1
  25. # 18: B 3 2 3 2
  26. # 19: B 4 2 4 3
  27. # 20: B 5 2 5 4

请注意,这是R语言代码示例,用于将df_masterdf连接并添加diff_monthmean列,条件是stationmonth的值相等。

英文:

If you convert them to data.tables, you can add the difference column with an update join, joining df_master with df on the condition that the values for both station and month are equal.

  1. library(data.table)
  2. setDT(df_master)
  3. setDT(df)
  4. df_master[df, on = .(station, month),
  5. diff_monthmean := value - i.mean]
  6. df_master
  7. # station day month value diff_monthmean
  8. # 1: A 27 1 -4 8
  9. # 2: A 28 1 -3 9
  10. # 3: A 29 1 -2 10
  11. # 4: A 30 1 -1 11
  12. # 5: A 31 1 0 12
  13. # 6: A 1 2 1 12
  14. # 7: A 2 2 2 13
  15. # 8: A 3 2 3 14
  16. # 9: A 4 2 4 15
  17. # 10: A 5 2 5 16
  18. # 11: B 27 1 -4 -4
  19. # 12: B 28 1 -3 -3
  20. # 13: B 29 1 -2 -2
  21. # 14: B 30 1 -1 -1
  22. # 15: B 31 1 0 0
  23. # 16: B 1 2 1 0
  24. # 17: B 2 2 2 1
  25. # 18: B 3 2 3 2
  26. # 19: B 4 2 4 3
  27. # 20: B 5 2 5 4

答案2

得分: 2

使用 dplyr 进行左连接

  1. library(dplyr)
  2. left_join(df_master, df, by = c('station', 'month')) %>%
  3. mutate(monthdiff = value - mean) %>%
  4. select(-mean)
英文:

With dplyr using a left join

  1. library(dplyr)
  2. left_join(df_master, df, by = c('station', 'month')) %>%
  3. mutate(monthdiff = value - mean) %>%
  4. select(-mean)

答案3

得分: 1

Another option could be:

  1. transform(df_master,
  2. diff = value - merge(df_master, df, by = c('station', 'month'), all.x = TRUE)$mean)

Or, using match with interaction

  1. transform(df_master,
  2. diff = value - df$mean[match(interaction(df_master[c("month", "station")]), interaction(df[c("month", "station")]))])
英文:

Another option could be:

  1. transform(df_master,
  2. diff = value - merge(df_master, df, by = c('station', 'month'), all.x = TRUE)$mean)

Or, using match with interaction

  1. transform(df_master,
  2. diff = value - df$mean[match(interaction(df_master[c("month", "station")]), interaction(df[c("month", "station")]))])
  3. </details>

huangapple
  • 本文由 发表于 2020年1月7日 00:21:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/59615524.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定