2023年1月9日 17:20:16go评论96阅读模式

英文:

Problem with left_join: "Join columns must be present"

问题

我试图总结出现在“中间”位置的频率数据，即在第一个和最后一个“位置”之间。我对这个任务的方法是筛选这些数据，进行总结，然后将新数据重新连接到它们被筛选出来的数据框中。这在训练数据中效果很好：

library(tidyverse)
df %>%
  group_by(rowid) %>%
  # 筛选中间位置的频率数据：
  filter(position != first(position) & position != last(position)) %>%
  # 总结：
  summarize(across(position),
            middle_position = mean(f, na.rm = TRUE),
            word = str_c(word, collapse = " ")
            ) %>%
  left_join(df, ., by = c("rowid", "position"))

然而，当应用于我的实际数据时，我收到以下错误消息：

Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rllang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
 1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
 3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
    ▆
 1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
 2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
 3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
 4.   └─dplyr:::join_mutate(...)
 5.     └─dplyr:::join_cols(...)
 6.       └─dplyr:::standardise_join_by(...)
 7.         └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
 8.           └─rlang::abort(bullets, call = error_call)

主要问题似乎是变量 position，为什么它没有被识别？我已经花了很多时间尝试解决这个问题，但无法解决，希望得到帮助！

数据：

df <- data.frame(
  size = c(3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 3, 3, 3),
  rowid = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5),
  turn = c(rep("How are you?", 3),
           rep("I'm fine.", 3),
           rep("How's the weather?", 4),
           rep("It's really very cold.", 5),
           rep("I love you", 3)),
  word = c("how", "are", "you",
           "i", "'m", "fine",
           "how", "'s", "the", "weather",
           "it", "'s", "really", "very", "cold",
           "i", "love", "you"),
  f = c(400, 300, 250,
        600, 555, 1,
        400, 500, 700, 20,
        390, 500, 177, 200, 35,
        600, 199, 400),
  position = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3)
)

英文:

I'm trying to summarise frequency data that occur in 'middle' positions, i.e., between the first and the last position. My approach to this task is to filter for these data, do the summarise, and then rejoin the new data with the dataframe from which they were filtered. This works well with the training data:

library(tidyverse)
df %&gt;%
  group_by(rowid) %&gt;%
  # summarize frequencies for middle postions:
  filter(position != first(position) &amp; position != last(position)) %&gt;%
  # summarise:
  summarize(across(position),
            middle_position = mean(f, na.rm = TRUE),
            word = str_c(word, collapse=&quot; &quot;)
            ) %&gt;%
  left_join(df, ., by = c(&quot;rowid&quot;, &quot;position&quot;))

However, applied to my actual data, I get this error message:

Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
&gt; rlang::last_error()
&lt;error/rlang_error&gt;
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
 1. ... %&gt;% left_join(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 3. dplyr:::left_join.data.frame(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
Run `rlang::last_trace()` to see the full context.
&gt; rlang::last_trace()
&lt;error/rlang_error&gt;
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
    ▆
 1. ├─... %&gt;% left_join(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 2. ├─dplyr::left_join(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c(&quot;rowid&quot;, &quot;position&quot;))
 4.   └─dplyr:::join_mutate(...)
 5.     └─dplyr:::join_cols(...)
 6.       └─dplyr:::standardise_join_by(...)
 7.         └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
 8.           └─rlang::abort(bullets, call = error_call)

The main problem seems to be the variable position- why is it not recognized? I've spent a good few hours trying to solve the issue but couldn't, and would be grateful for help!

Data:

df &lt;- data.frame(
  size = c(3,3,3,
              3,3,3,
              4,4,4,4,
              5,5,5,5,5,
              3,3,3),
  rowid = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
  turn = c(rep(&quot;How are you?&quot;,3),
           rep(&quot;I&#39;m fine.&quot;,3),
           rep(&quot;How&#39;s the weather?&quot;,4),
           rep(&quot;It&#39;s really very cold.&quot;,5),
           rep(&quot;I love you&quot;,3)),
  word = c(&quot;how&quot;,&quot;are&quot;,&quot;you&quot;,
           &quot;i&quot;,&quot;&#39;m&quot;,&quot;fine&quot;,
           &quot;how&quot;,&quot;&#39;s&quot;,&quot;the&quot;,&quot;weather&quot;,
           &quot;it&quot;,&quot;&#39;s&quot;,&quot;really&quot;, &quot;very&quot;,&quot;cold&quot;,
           &quot;i&quot;,&quot;love&quot;,&quot;you&quot;),
  f = c(400,300,250,
        600,555,1,
        400,500,700,20,
        390,500,177,200,35,
        600,199,400),
  position = c(1,2,3,
               1,2,3,
               1,2,3,4,
               1,2,3,4,5,
               1,2,3)
)

答案1

得分: 1

这段代码在data.table中适用。无需进行连接操作。

library(data.table)
# 转换为 data.table
setDT(df)
# 通过 rowid 获取中间行的 id
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# 更新这些中间行
df[idx, `:=`(middle_position = mean(f),
             word_midddel = paste0(word, collapse = " ")),
   by = .(rowid)]
    size rowid                   turn    word   f position middle_position   word_midddel
 1:    3     1           How are you?     how 400        1              NA           <NA>
 2:    3     1           How are you?     are 300        2        300.0000            are
 3:    3     1           How are you?     you 250        3              NA           <NA>
 4:    3     2              I'm fine.       i 600        1              NA           <NA>
 5:    3     2              I'm fine.      'm 555        2        555.0000             'm
 6:    3     2              I'm fine.    fine   1        3              NA           <NA>
 7:    4     3     How's the weather?     how 400        1              NA           <NA>
 8:    4     3     How's the weather?      's 500        2        600.0000         's the
 9:    4     3     How's the weather?     the 700        3        600.0000         's the
10:    4     3     How's the weather? weather  20        4              NA           <NA>
11:    5     4 It's really very cold.      it 390        1              NA           <NA>
12:    5     4 It's really very cold.      's 500        2        292.3333 's really very
13:    5     4 It's really very cold.  really 177        3        292.3333 's really very
14:    5     4 It's really very cold.    very 200        4        292.3333 's really very
15:    5     4 It's really very cold.    cold  35        5              NA           <NA>
16:    3     5             I love you       i 600        1              NA           <NA>
17:    3     5             I love you    love 199        2        199.0000           love
18:    3     5             I love you     you 400        3              NA           <NA>

英文:

This works for me in data.table. No joins needed.

library(data.table)
# set to data.table
setDT(df)
# get id&#39;s of middle rows by rowid
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# update these middle rows
df[idx, `:=`(middle_position = mean(f),
word_midddel = paste0(word, collapse = &quot; &quot;)),
by = .(rowid)]
size rowid                   turn    word   f position middle_position   word_midddel
1:    3     1           How are you?     how 400        1              NA           &lt;NA&gt;
2:    3     1           How are you?     are 300        2        300.0000            are
3:    3     1           How are you?     you 250        3              NA           &lt;NA&gt;
4:    3     2              I&#39;m fine.       i 600        1              NA           &lt;NA&gt;
5:    3     2              I&#39;m fine.      &#39;m 555        2        555.0000             &#39;m
6:    3     2              I&#39;m fine.    fine   1        3              NA           &lt;NA&gt;
7:    4     3     How&#39;s the weather?     how 400        1              NA           &lt;NA&gt;
8:    4     3     How&#39;s the weather?      &#39;s 500        2        600.0000         &#39;s the
9:    4     3     How&#39;s the weather?     the 700        3        600.0000         &#39;s the
10:    4     3     How&#39;s the weather? weather  20        4              NA           &lt;NA&gt;
11:    5     4 It&#39;s really very cold.      it 390        1              NA           &lt;NA&gt;
12:    5     4 It&#39;s really very cold.      &#39;s 500        2        292.3333 &#39;s really very
13:    5     4 It&#39;s really very cold.  really 177        3        292.3333 &#39;s really very
14:    5     4 It&#39;s really very cold.    very 200        4        292.3333 &#39;s really very
15:    5     4 It&#39;s really very cold.    cold  35        5              NA           &lt;NA&gt;
16:    3     5             I love you       i 600        1              NA           &lt;NA&gt;
17:    3     5             I love you    love 199        2        199.0000           love
18:    3     5             I love you     you 400        3              NA           &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

左连接问题: “连接列必须存在”

问题

答案1

如何在R中循环遍历GEE模型中的公式列表？

在一个Shiny应用中模块化响应式表达式

Subset a list of matrices by common column names.

`do.call` 使用合格的命令

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。