左连接问题: “连接列必须存在”

huangapple go评论72阅读模式
英文:

Problem with left_join: "Join columns must be present"

问题

我试图总结出现在“中间”位置的频率数据,即在第一个和最后一个“位置”之间。我对这个任务的方法是筛选这些数据,进行总结,然后将新数据重新连接到它们被筛选出来的数据框中。这在训练数据中效果很好:

library(tidyverse)
df %>%
  group_by(rowid) %>%
  # 筛选中间位置的频率数据:
  filter(position != first(position) & position != last(position)) %>%
  # 总结:
  summarize(across(position),
            middle_position = mean(f, na.rm = TRUE),
            word = str_c(word, collapse = " ")
            ) %>%
  left_join(df, ., by = c("rowid", "position"))

然而,当应用于我的实际数据时,我收到以下错误消息:

Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rllang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
 1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
 3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
    ▆
 1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
 2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
 3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
 4.   └─dplyr:::join_mutate(...)
 5.     └─dplyr:::join_cols(...)
 6.       └─dplyr:::standardise_join_by(...)
 7.         └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
 8.           └─rlang::abort(bullets, call = error_call)

主要问题似乎是变量 position,为什么它没有被识别?我已经花了很多时间尝试解决这个问题,但无法解决,希望得到帮助!

数据:

df <- data.frame(
  size = c(3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 3, 3, 3),
  rowid = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5),
  turn = c(rep("How are you?", 3),
           rep("I'm fine.", 3),
           rep("How's the weather?", 4),
           rep("It's really very cold.", 5),
           rep("I love you", 3)),
  word = c("how", "are", "you",
           "i", "'m", "fine",
           "how", "'s", "the", "weather",
           "it", "'s", "really", "very", "cold",
           "i", "love", "you"),
  f = c(400, 300, 250,
        600, 555, 1,
        400, 500, 700, 20,
        390, 500, 177, 200, 35,
        600, 199, 400),
  position = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3)
)
英文:

I'm trying to summarise frequency data that occur in 'middle' positions, i.e., between the first and the last position. My approach to this task is to filter for these data, do the summarise, and then rejoin the new data with the dataframe from which they were filtered. This works well with the training data:

library(tidyverse)
df %&gt;%
  group_by(rowid) %&gt;%
  # summarize frequencies for middle postions:
  filter(position != first(position) &amp; position != last(position)) %&gt;%
  # summarise:
  summarize(across(position),
            middle_position = mean(f, na.rm = TRUE),
            word = str_c(word, collapse=&quot; &quot;)
            ) %&gt;%
  left_join(df, ., by = c(&quot;rowid&quot;, &quot;position&quot;))

However, applied to my actual data, I get this error message:

Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
&gt; rlang::last_error()
&lt;error/rlang_error&gt;
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
 1. ... %&gt;% left_join(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 3. dplyr:::left_join.data.frame(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
Run `rlang::last_trace()` to see the full context.
&gt; rlang::last_trace()
&lt;error/rlang_error&gt;
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
    ▆
 1. ├─... %&gt;% left_join(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 2. ├─dplyr::left_join(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 4.   └─dplyr:::join_mutate(...)
 5.     └─dplyr:::join_cols(...)
 6.       └─dplyr:::standardise_join_by(...)
 7.         └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
 8.           └─rlang::abort(bullets, call = error_call)

The main problem seems to be the variable position- why is it not recognized? I've spent a good few hours trying to solve the issue but couldn't, and would be grateful for help!

Data:

df &lt;- data.frame(
  size = c(3,3,3,
              3,3,3,
              4,4,4,4,
              5,5,5,5,5,
              3,3,3),
  rowid = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
  turn = c(rep(&quot;How are you?&quot;,3),
           rep(&quot;I&#39;m fine.&quot;,3),
           rep(&quot;How&#39;s the weather?&quot;,4),
           rep(&quot;It&#39;s really very cold.&quot;,5),
           rep(&quot;I love you&quot;,3)),
  word = c(&quot;how&quot;,&quot;are&quot;,&quot;you&quot;,
           &quot;i&quot;,&quot;&#39;m&quot;,&quot;fine&quot;,
           &quot;how&quot;,&quot;&#39;s&quot;,&quot;the&quot;,&quot;weather&quot;,
           &quot;it&quot;,&quot;&#39;s&quot;,&quot;really&quot;, &quot;very&quot;,&quot;cold&quot;,
           &quot;i&quot;,&quot;love&quot;,&quot;you&quot;),
  f = c(400,300,250,
        600,555,1,
        400,500,700,20,
        390,500,177,200,35,
        600,199,400),
  position = c(1,2,3,
               1,2,3,
               1,2,3,4,
               1,2,3,4,5,
               1,2,3)
)

答案1

得分: 1

这段代码在data.table中适用。无需进行连接操作。

library(data.table)
# 转换为 data.table
setDT(df)
# 通过 rowid 获取中间行的 id
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# 更新这些中间行
df[idx, `:=`(middle_position = mean(f),
             word_midddel = paste0(word, collapse = " ")),
   by = .(rowid)]

    size rowid                   turn    word   f position middle_position   word_midddel
 1:    3     1           How are you?     how 400        1              NA           <NA>
 2:    3     1           How are you?     are 300        2        300.0000            are
 3:    3     1           How are you?     you 250        3              NA           <NA>
 4:    3     2              I'm fine.       i 600        1              NA           <NA>
 5:    3     2              I'm fine.      'm 555        2        555.0000             'm
 6:    3     2              I'm fine.    fine   1        3              NA           <NA>
 7:    4     3     How's the weather?     how 400        1              NA           <NA>
 8:    4     3     How's the weather?      's 500        2        600.0000         's the
 9:    4     3     How's the weather?     the 700        3        600.0000         's the
10:    4     3     How's the weather? weather  20        4              NA           <NA>
11:    5     4 It's really very cold.      it 390        1              NA           <NA>
12:    5     4 It's really very cold.      's 500        2        292.3333 's really very
13:    5     4 It's really very cold.  really 177        3        292.3333 's really very
14:    5     4 It's really very cold.    very 200        4        292.3333 's really very
15:    5     4 It's really very cold.    cold  35        5              NA           <NA>
16:    3     5             I love you       i 600        1              NA           <NA>
17:    3     5             I love you    love 199        2        199.0000           love
18:    3     5             I love you     you 400        3              NA           <NA>
英文:

This works for me in data.table. No joins needed.

library(data.table)
# set to data.table
setDT(df)
# get id&#39;s of middle rows by rowid
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# update these middle rows
df[idx, `:=`(middle_position = mean(f),
word_midddel = paste0(word, collapse = &quot; &quot;)),
by = .(rowid)]
size rowid                   turn    word   f position middle_position   word_midddel
1:    3     1           How are you?     how 400        1              NA           &lt;NA&gt;
2:    3     1           How are you?     are 300        2        300.0000            are
3:    3     1           How are you?     you 250        3              NA           &lt;NA&gt;
4:    3     2              I&#39;m fine.       i 600        1              NA           &lt;NA&gt;
5:    3     2              I&#39;m fine.      &#39;m 555        2        555.0000             &#39;m
6:    3     2              I&#39;m fine.    fine   1        3              NA           &lt;NA&gt;
7:    4     3     How&#39;s the weather?     how 400        1              NA           &lt;NA&gt;
8:    4     3     How&#39;s the weather?      &#39;s 500        2        600.0000         &#39;s the
9:    4     3     How&#39;s the weather?     the 700        3        600.0000         &#39;s the
10:    4     3     How&#39;s the weather? weather  20        4              NA           &lt;NA&gt;
11:    5     4 It&#39;s really very cold.      it 390        1              NA           &lt;NA&gt;
12:    5     4 It&#39;s really very cold.      &#39;s 500        2        292.3333 &#39;s really very
13:    5     4 It&#39;s really very cold.  really 177        3        292.3333 &#39;s really very
14:    5     4 It&#39;s really very cold.    very 200        4        292.3333 &#39;s really very
15:    5     4 It&#39;s really very cold.    cold  35        5              NA           &lt;NA&gt;
16:    3     5             I love you       i 600        1              NA           &lt;NA&gt;
17:    3     5             I love you    love 199        2        199.0000           love
18:    3     5             I love you     you 400        3              NA           &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年1月9日 17:20:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75055207.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定