问题

以下是更简洁的 dplyr 代码，以获得所需的输出，而不使用 left_join：

library(dplyr)

df %>%
  group_by(rowid) %>%
  mutate(avg_f = ifelse(position != min(position) & position != max(position),
                       mean(f, na.rm = TRUE), f)) %>%
  slice(c(1, 2, n())) %>%
  mutate(position = ifelse(position == min(position), 1,
                           ifelse(position == max(position), 2, 1.5))) %>%
  select(-avg_f)

这将生成与您提供的结果相同的输出，但不涉及 left_join。

英文:

I have this type of data, with frequency data and position data grouped by rowid:

df
   rowid    word   f position
1      2       i 700        1
2      2      &#39;m 600        2
3      2    fine   1        3
4      3     how 400        1
5      3      &#39;s 500        2
6      3     the 700        3
7      3 weather  20        4
8      4      it 390        1
9      4      &#39;s 500        2
10     4  really 177        3
11     4    very 200        4
12     4    cold  35        5
13     5       i 700        1
14     5    love 199        2
15     5     you 400        3

The task I'm facing seems simple: in those rowids where there are more than 3 positions, I need to replace the frequencies of all middle positions with their average. The following approach works but seems over-convoluted, so I'm almost certain there will be a more straightforward dplyrway to get the desired output:

df %&gt;%
  group_by(rowid) %&gt;%
  # filter for &#39;middle&#39; positions:
  filter(position != first(position) &amp; position != last(position)) %&gt;%
  # summarise:
  summarize(across(position),
            # create average frequency:
            f_middle_position = mean(f, na.rm = TRUE),
            # concatenate words:
            word = str_c(word, collapse = &quot; &quot;)
            ) %&gt;%
  filter(!duplicated(f_middle_position)) %&gt;%
  # join with df:
  left_join(df, ., by = c(&quot;rowid&quot;, &quot;position&quot;)) %&gt;%
  # remove rows other than #1,#2, and last:
  group_by(rowid) %&gt;%
  # create row count:
  mutate(rn = row_number()) %&gt;%
  # filter first, second, and last row per group:
  filter(rn %in% c(1, 2, last(rn))) %&gt;%
  # transfer frequencies for middle positions:
  mutate(f = ifelse(is.na(f_middle_position), f, f_middle_position)) %&gt;%
  # make more changes:
  mutate(
    # change position labels:
    position = ifelse(position == first(position), 1,
                           ifelse(position == last(position), 2, 1.5)),
    # update word:
    word = ifelse(is.na(word.y), word.x, word.y)
         ) %&gt;%
  # remove obsolete variables:
  select(-c(f_middle_position, word.y, word.x,rn))
 A tibble: 12 &#215; 4
# Groups:   rowid [4]
   rowid     f position word          
   &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;         
 1     2  700       1   i             
 2     2  600       1.5 &#39;m            
 3     2    1       2   fine          
 4     3  400       1   how           
 5     3  600       1.5 &#39;s the        
 6     3   20       2   weather       
 7     4  390       1   it            
 8     4  292.      1.5 &#39;s really very
 9     4   35       2   cold          
10     5  700       1   i             
11     5  199       1.5 love          
12     5  400       2   you

How can this result be obtained in a more concise way in dplyr and, preferably without the left_join, which causes problems with my actual data?

Data:

df &lt;- data.frame(
  rowid = c(2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
  word = c(&quot;i&quot;,&quot;&#39;m&quot;,&quot;fine&quot;,
           &quot;how&quot;,&quot;&#39;s&quot;,&quot;the&quot;,&quot;weather&quot;,
           &quot;it&quot;,&quot;&#39;s&quot;,&quot;really&quot;, &quot;very&quot;,&quot;cold&quot;,
           &quot;i&quot;,&quot;love&quot;,&quot;you&quot;),
  f = c(700,600,1,
        400,500,700,20,
        390,500,177,200,35,
        700,199,400),
  position = c(1,2,3,
               1,2,3,4,
               1,2,3,4,5,
               1,2,3)
)

答案1

得分: 1

# 创建一个名为 `pos` 的组变量，将第一行标记为 `1`，中间行标记为 `1.5`，最后一行标记为 `2`。
# 然后按照 `rowid` 和 `pos` 进行分组，对 `f` 应用 `mean()` 函数，对 `word` 应用 `paste()` 函数。

library(dplyr)

df %>%
  group_by(rowid) %>%
  mutate(pos = case_when(position == 1 ~ 1, position == n() ~ 2, TRUE ~ 1.5)) %>%
  group_by(rowid, pos) %>%
  summarise(f = mean(f), word = paste(word, collapse = ' '), .groups = 'drop')

英文:

You can create a group variable pos that marks the first row with 1, the middle with 1.5, and the last with 2. Then group the data by rowid and pos and apply mean() and paste() on f and word respectively.

library(dplyr)

df %&gt;%
  group_by(rowid) %&gt;% 
  mutate(pos = case_when(position == 1 ~ 1, position == n() ~ 2, TRUE ~ 1.5)) %&gt;%
  group_by(rowid, pos) %&gt;%
  summarise(f = mean(f), word = paste(word, collapse = &#39; &#39;), .groups = &#39;drop&#39;)

# # A tibble: 12 &#215; 4
#    rowid   pos     f word          
#    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;         
#  1     2   1    700  i             
#  2     2   1.5  600  &#39;m            
#  3     2   2      1  fine          
#  4     3   1    400  how           
#  5     3   1.5  600  &#39;s the        
#  6     3   2     20  weather       
#  7     4   1    390  it            
#  8     4   1.5  292. &#39;s really very
#  9     4   2     35  cold          
# 10     5   1    700  i             
# 11     5   1.5  199  love          
# 12     5   2    400  you

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用平均频率值替换“middle”频率。

问题

答案1

R: Dplyr：如何检查一个变量的值是否包含在另一个变量中

如何在R中循环整个脚本

提取日期和时间戳中的时间。

如何解决此警告消息：“需要（间接）孤立包：’influenceR’”？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论