英文:
Problem with left_join: "Join columns must be present"
问题
我试图总结出现在“中间”位置的频率数据,即在第一个和最后一个“位置”之间。我对这个任务的方法是筛选这些数据,进行总结,然后将新数据重新连接到它们被筛选出来的数据框中。这在训练数据中效果很好:
library(tidyverse)
df %>%
group_by(rowid) %>%
# 筛选中间位置的频率数据:
filter(position != first(position) & position != last(position)) %>%
# 总结:
summarize(across(position),
middle_position = mean(f, na.rm = TRUE),
word = str_c(word, collapse = " ")
) %>%
left_join(df, ., by = c("rowid", "position"))
然而,当应用于我的实际数据时,我收到以下错误消息:
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rllang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
▆
1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
4. └─dplyr:::join_mutate(...)
5. └─dplyr:::join_cols(...)
6. └─dplyr:::standardise_join_by(...)
7. └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
8. └─rlang::abort(bullets, call = error_call)
主要问题似乎是变量 position
,为什么它没有被识别?我已经花了很多时间尝试解决这个问题,但无法解决,希望得到帮助!
数据:
df <- data.frame(
size = c(3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 3, 3, 3),
rowid = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5),
turn = c(rep("How are you?", 3),
rep("I'm fine.", 3),
rep("How's the weather?", 4),
rep("It's really very cold.", 5),
rep("I love you", 3)),
word = c("how", "are", "you",
"i", "'m", "fine",
"how", "'s", "the", "weather",
"it", "'s", "really", "very", "cold",
"i", "love", "you"),
f = c(400, 300, 250,
600, 555, 1,
400, 500, 700, 20,
390, 500, 177, 200, 35,
600, 199, 400),
position = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3)
)
英文:
I'm trying to summarise f
requency data that occur in 'middle' position
s, i.e., between the first and the last position
. My approach to this task is to filter for these data, do the summarise
, and then rejoin the new data with the dataframe from which they were filtered. This works well with the training data:
library(tidyverse)
df %>%
group_by(rowid) %>%
# summarize frequencies for middle postions:
filter(position != first(position) & position != last(position)) %>%
# summarise:
summarize(across(position),
middle_position = mean(f, na.rm = TRUE),
word = str_c(word, collapse=" ")
) %>%
left_join(df, ., by = c("rowid", "position"))
However, applied to my actual data, I get this error message:
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
▆
1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
4. └─dplyr:::join_mutate(...)
5. └─dplyr:::join_cols(...)
6. └─dplyr:::standardise_join_by(...)
7. └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
8. └─rlang::abort(bullets, call = error_call)
The main problem seems to be the variable position
- why is it not recognized? I've spent a good few hours trying to solve the issue but couldn't, and would be grateful for help!
Data:
df <- data.frame(
size = c(3,3,3,
3,3,3,
4,4,4,4,
5,5,5,5,5,
3,3,3),
rowid = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
turn = c(rep("How are you?",3),
rep("I'm fine.",3),
rep("How's the weather?",4),
rep("It's really very cold.",5),
rep("I love you",3)),
word = c("how","are","you",
"i","'m","fine",
"how","'s","the","weather",
"it","'s","really", "very","cold",
"i","love","you"),
f = c(400,300,250,
600,555,1,
400,500,700,20,
390,500,177,200,35,
600,199,400),
position = c(1,2,3,
1,2,3,
1,2,3,4,
1,2,3,4,5,
1,2,3)
)
答案1
得分: 1
这段代码在data.table
中适用。无需进行连接操作。
library(data.table)
# 转换为 data.table
setDT(df)
# 通过 rowid 获取中间行的 id
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# 更新这些中间行
df[idx, `:=`(middle_position = mean(f),
word_midddel = paste0(word, collapse = " ")),
by = .(rowid)]
size rowid turn word f position middle_position word_midddel
1: 3 1 How are you? how 400 1 NA <NA>
2: 3 1 How are you? are 300 2 300.0000 are
3: 3 1 How are you? you 250 3 NA <NA>
4: 3 2 I'm fine. i 600 1 NA <NA>
5: 3 2 I'm fine. 'm 555 2 555.0000 'm
6: 3 2 I'm fine. fine 1 3 NA <NA>
7: 4 3 How's the weather? how 400 1 NA <NA>
8: 4 3 How's the weather? 's 500 2 600.0000 's the
9: 4 3 How's the weather? the 700 3 600.0000 's the
10: 4 3 How's the weather? weather 20 4 NA <NA>
11: 5 4 It's really very cold. it 390 1 NA <NA>
12: 5 4 It's really very cold. 's 500 2 292.3333 's really very
13: 5 4 It's really very cold. really 177 3 292.3333 's really very
14: 5 4 It's really very cold. very 200 4 292.3333 's really very
15: 5 4 It's really very cold. cold 35 5 NA <NA>
16: 3 5 I love you i 600 1 NA <NA>
17: 3 5 I love you love 199 2 199.0000 love
18: 3 5 I love you you 400 3 NA <NA>
英文:
This works for me in data.table
. No joins needed.
library(data.table)
# set to data.table
setDT(df)
# get id's of middle rows by rowid
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# update these middle rows
df[idx, `:=`(middle_position = mean(f),
word_midddel = paste0(word, collapse = " ")),
by = .(rowid)]
size rowid turn word f position middle_position word_midddel
1: 3 1 How are you? how 400 1 NA <NA>
2: 3 1 How are you? are 300 2 300.0000 are
3: 3 1 How are you? you 250 3 NA <NA>
4: 3 2 I'm fine. i 600 1 NA <NA>
5: 3 2 I'm fine. 'm 555 2 555.0000 'm
6: 3 2 I'm fine. fine 1 3 NA <NA>
7: 4 3 How's the weather? how 400 1 NA <NA>
8: 4 3 How's the weather? 's 500 2 600.0000 's the
9: 4 3 How's the weather? the 700 3 600.0000 's the
10: 4 3 How's the weather? weather 20 4 NA <NA>
11: 5 4 It's really very cold. it 390 1 NA <NA>
12: 5 4 It's really very cold. 's 500 2 292.3333 's really very
13: 5 4 It's really very cold. really 177 3 292.3333 's really very
14: 5 4 It's really very cold. very 200 4 292.3333 's really very
15: 5 4 It's really very cold. cold 35 5 NA <NA>
16: 3 5 I love you i 600 1 NA <NA>
17: 3 5 I love you love 199 2 199.0000 love
18: 3 5 I love you you 400 3 NA <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论