英文:
Problem with left_join: "Join columns must be present"
问题
我试图总结出现在“中间”位置的频率数据,即在第一个和最后一个“位置”之间。我对这个任务的方法是筛选这些数据,进行总结,然后将新数据重新连接到它们被筛选出来的数据框中。这在训练数据中效果很好:
library(tidyverse)
df %>%
group_by(rowid) %>%
# 筛选中间位置的频率数据:
filter(position != first(position) & position != last(position)) %>%
# 总结:
summarize(across(position),
middle_position = mean(f, na.rm = TRUE),
word = str_c(word, collapse = " ")
) %>%
left_join(df, ., by = c("rowid", "position"))
然而,当应用于我的实际数据时,我收到以下错误消息:
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rllang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
▆
1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
4. └─dplyr:::join_mutate(...)
5. └─dplyr:::join_cols(...)
6. └─dplyr:::standardise_join_by(...)
7. └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
8. └─rlang::abort(bullets, call = error_call)
主要问题似乎是变量 position,为什么它没有被识别?我已经花了很多时间尝试解决这个问题,但无法解决,希望得到帮助!
数据:
df <- data.frame(
size = c(3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 3, 3, 3),
rowid = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5),
turn = c(rep("How are you?", 3),
rep("I'm fine.", 3),
rep("How's the weather?", 4),
rep("It's really very cold.", 5),
rep("I love you", 3)),
word = c("how", "are", "you",
"i", "'m", "fine",
"how", "'s", "the", "weather",
"it", "'s", "really", "very", "cold",
"i", "love", "you"),
f = c(400, 300, 250,
600, 555, 1,
400, 500, 700, 20,
390, 500, 177, 200, 35,
600, 199, 400),
position = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3)
)
英文:
I'm trying to summarise frequency data that occur in 'middle' positions, i.e., between the first and the last position. My approach to this task is to filter for these data, do the summarise, and then rejoin the new data with the dataframe from which they were filtered. This works well with the training data:
library(tidyverse)
df %>%
group_by(rowid) %>%
# summarize frequencies for middle postions:
filter(position != first(position) & position != last(position)) %>%
# summarise:
summarize(across(position),
middle_position = mean(f, na.rm = TRUE),
word = str_c(word, collapse=" ")
) %>%
left_join(df, ., by = c("rowid", "position"))
However, applied to my actual data, I get this error message:
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
▆
1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
4. └─dplyr:::join_mutate(...)
5. └─dplyr:::join_cols(...)
6. └─dplyr:::standardise_join_by(...)
7. └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
8. └─rlang::abort(bullets, call = error_call)
The main problem seems to be the variable position- why is it not recognized? I've spent a good few hours trying to solve the issue but couldn't, and would be grateful for help!
Data:
df <- data.frame(
size = c(3,3,3,
3,3,3,
4,4,4,4,
5,5,5,5,5,
3,3,3),
rowid = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
turn = c(rep("How are you?",3),
rep("I'm fine.",3),
rep("How's the weather?",4),
rep("It's really very cold.",5),
rep("I love you",3)),
word = c("how","are","you",
"i","'m","fine",
"how","'s","the","weather",
"it","'s","really", "very","cold",
"i","love","you"),
f = c(400,300,250,
600,555,1,
400,500,700,20,
390,500,177,200,35,
600,199,400),
position = c(1,2,3,
1,2,3,
1,2,3,4,
1,2,3,4,5,
1,2,3)
)
答案1
得分: 1
这段代码在data.table中适用。无需进行连接操作。
library(data.table)
# 转换为 data.table
setDT(df)
# 通过 rowid 获取中间行的 id
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# 更新这些中间行
df[idx, `:=`(middle_position = mean(f),
word_midddel = paste0(word, collapse = " ")),
by = .(rowid)]
size rowid turn word f position middle_position word_midddel
1: 3 1 How are you? how 400 1 NA <NA>
2: 3 1 How are you? are 300 2 300.0000 are
3: 3 1 How are you? you 250 3 NA <NA>
4: 3 2 I'm fine. i 600 1 NA <NA>
5: 3 2 I'm fine. 'm 555 2 555.0000 'm
6: 3 2 I'm fine. fine 1 3 NA <NA>
7: 4 3 How's the weather? how 400 1 NA <NA>
8: 4 3 How's the weather? 's 500 2 600.0000 's the
9: 4 3 How's the weather? the 700 3 600.0000 's the
10: 4 3 How's the weather? weather 20 4 NA <NA>
11: 5 4 It's really very cold. it 390 1 NA <NA>
12: 5 4 It's really very cold. 's 500 2 292.3333 's really very
13: 5 4 It's really very cold. really 177 3 292.3333 's really very
14: 5 4 It's really very cold. very 200 4 292.3333 's really very
15: 5 4 It's really very cold. cold 35 5 NA <NA>
16: 3 5 I love you i 600 1 NA <NA>
17: 3 5 I love you love 199 2 199.0000 love
18: 3 5 I love you you 400 3 NA <NA>
英文:
This works for me in data.table. No joins needed.
library(data.table)
# set to data.table
setDT(df)
# get id's of middle rows by rowid
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# update these middle rows
df[idx, `:=`(middle_position = mean(f),
word_midddel = paste0(word, collapse = " ")),
by = .(rowid)]
size rowid turn word f position middle_position word_midddel
1: 3 1 How are you? how 400 1 NA <NA>
2: 3 1 How are you? are 300 2 300.0000 are
3: 3 1 How are you? you 250 3 NA <NA>
4: 3 2 I'm fine. i 600 1 NA <NA>
5: 3 2 I'm fine. 'm 555 2 555.0000 'm
6: 3 2 I'm fine. fine 1 3 NA <NA>
7: 4 3 How's the weather? how 400 1 NA <NA>
8: 4 3 How's the weather? 's 500 2 600.0000 's the
9: 4 3 How's the weather? the 700 3 600.0000 's the
10: 4 3 How's the weather? weather 20 4 NA <NA>
11: 5 4 It's really very cold. it 390 1 NA <NA>
12: 5 4 It's really very cold. 's 500 2 292.3333 's really very
13: 5 4 It's really very cold. really 177 3 292.3333 's really very
14: 5 4 It's really very cold. very 200 4 292.3333 's really very
15: 5 4 It's really very cold. cold 35 5 NA <NA>
16: 3 5 I love you i 600 1 NA <NA>
17: 3 5 I love you love 199 2 199.0000 love
18: 3 5 I love you you 400 3 NA <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论