英文:
Using match on multiple criteria to generate value in R
问题
我目前有以下数据格式:
df = data.frame(c(rep("A", 12), rep("B", 12)), rep(1:12, 2), seq(-12, 11))
colnames(df) = c("station", "month", "mean")
df
df_master = data.frame(c(rep("A", 10), rep("B", 10)), rep(c(27:31, 1:5), 2), rep(c(rep(1, 5), rep(2, 5)), 2), rep(seq(-4,5), 2))
colnames(df_master) = c("station", "day", "month", "value")
df_master
事实上,df 是每个站点的月均值,我想要在 df_master 数据集中计算一个新变量,该变量计算每日观察值与每月均值之间的差异。我已经成功地用包括所有数据的总体平均值来实现这一点,但由于每个站点的均值不同,所以我想要创建一个新的站点特定的变量。
我尝试了以下代码来匹配月度数值,但目前还没有考虑跨站点的差异:
df_master$mean = df$mean[match(df_master$month, df$month)]
df_master = df_master %>% mutate(diff = value - mean)
如何进一步处理,以便按站点进行平均值计算?
英文:
I currently have the following data format:
df = data.frame(c(rep("A", 12), rep("B", 12)), rep(1:12, 2), seq(-12, 11))
colnames(df) = c("station", "month", "mean")
df
df_master = data.frame(c(rep("A", 10), rep("B", 10)), rep(c(27:31, 1:5), 2), rep(c(rep(1, 5), rep(2, 5)), 2), rep(seq(-4,5), 2))
colnames(df_master) = c("station", "day", "month", "value")
df_master
Effectively df is a monthly average value for each station and I want to compute a new variable in the df_master data set which computes the difference from the monthly mean for each daily observation. I have managed to do this with an overall average incuding all the data, but since the mean values vary from each station so I would like to make the new variable station specific.
I have tried the following code to match the monthly value, but this currently doesn't account for cross station differences:
df_master$mean = df$mean[match(df_master$month, df$month)]
df_master = df_master %>% mutate(diff = value - mean)
How can I progress this further so that the averages are taken per station?
答案1
得分: 2
以下是翻译好的部分:
如果你将它们转换为data.tables,你可以使用update join添加差异列,将df_master
与df
连接在station
和month
的值相等的条件上。
library(data.table)
setDT(df_master)
setDT(df)
df_master[df, on = .(station, month),
diff_monthmean := value - i.mean]
df_master
# station day month value diff_monthmean
# 1: A 27 1 -4 8
# 2: A 28 1 -3 9
# 3: A 29 1 -2 10
# 4: A 30 1 -1 11
# 5: A 31 1 0 12
# 6: A 1 2 1 12
# 7: A 2 2 2 13
# 8: A 3 2 3 14
# 9: A 4 2 4 15
# 10: A 5 2 5 16
# 11: B 27 1 -4 -4
# 12: B 28 1 -3 -3
# 13: B 29 1 -2 -2
# 14: B 30 1 -1 -1
# 15: B 31 1 0 0
# 16: B 1 2 1 0
# 17: B 2 2 2 1
# 18: B 3 2 3 2
# 19: B 4 2 4 3
# 20: B 5 2 5 4
请注意,这是R语言代码示例,用于将df_master
和df
连接并添加diff_monthmean
列,条件是station
和month
的值相等。
英文:
If you convert them to data.tables, you can add the difference column with an update join, joining df_master
with df
on the condition that the values for both station
and month
are equal.
library(data.table)
setDT(df_master)
setDT(df)
df_master[df, on = .(station, month),
diff_monthmean := value - i.mean]
df_master
# station day month value diff_monthmean
# 1: A 27 1 -4 8
# 2: A 28 1 -3 9
# 3: A 29 1 -2 10
# 4: A 30 1 -1 11
# 5: A 31 1 0 12
# 6: A 1 2 1 12
# 7: A 2 2 2 13
# 8: A 3 2 3 14
# 9: A 4 2 4 15
# 10: A 5 2 5 16
# 11: B 27 1 -4 -4
# 12: B 28 1 -3 -3
# 13: B 29 1 -2 -2
# 14: B 30 1 -1 -1
# 15: B 31 1 0 0
# 16: B 1 2 1 0
# 17: B 2 2 2 1
# 18: B 3 2 3 2
# 19: B 4 2 4 3
# 20: B 5 2 5 4
答案2
得分: 2
使用 dplyr
进行左连接
library(dplyr)
left_join(df_master, df, by = c('station', 'month')) %>%
mutate(monthdiff = value - mean) %>%
select(-mean)
英文:
With dplyr
using a left join
library(dplyr)
left_join(df_master, df, by = c('station', 'month')) %>%
mutate(monthdiff = value - mean) %>%
select(-mean)
答案3
得分: 1
Another option could be:
transform(df_master,
diff = value - merge(df_master, df, by = c('station', 'month'), all.x = TRUE)$mean)
Or, using match
with interaction
transform(df_master,
diff = value - df$mean[match(interaction(df_master[c("month", "station")]), interaction(df[c("month", "station")]))])
英文:
Another option could be:
transform(df_master,
diff = value - merge(df_master, df, by = c('station', 'month'), all.x = TRUE)$mean)
Or, using match
with interaction
transform(df_master,
diff = value - df$mean[match(interaction(df_master[c("month", "station")]), interaction(df[c("month", "station")]))])
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论