英文:
Replace 'middle' frequencies with averaged frequency values
问题
以下是更简洁的 dplyr
代码,以获得所需的输出,而不使用 left_join
:
library(dplyr)
df %>%
group_by(rowid) %>%
mutate(avg_f = ifelse(position != min(position) & position != max(position),
mean(f, na.rm = TRUE), f)) %>%
slice(c(1, 2, n())) %>%
mutate(position = ifelse(position == min(position), 1,
ifelse(position == max(position), 2, 1.5))) %>%
select(-avg_f)
这将生成与您提供的结果相同的输出,但不涉及 left_join
。
英文:
I have this type of data, with f
requency data and position
data grouped by rowid
:
df
rowid word f position
1 2 i 700 1
2 2 'm 600 2
3 2 fine 1 3
4 3 how 400 1
5 3 's 500 2
6 3 the 700 3
7 3 weather 20 4
8 4 it 390 1
9 4 's 500 2
10 4 really 177 3
11 4 very 200 4
12 4 cold 35 5
13 5 i 700 1
14 5 love 199 2
15 5 you 400 3
The task I'm facing seems simple: in those rowid
s where there are more than 3 positions
, I need to replace the f
requencies of all middle position
s with their average. The following approach works but seems over-convoluted, so I'm almost certain there will be a more straightforward dplyr
way to get the desired output:
df %>%
group_by(rowid) %>%
# filter for 'middle' positions:
filter(position != first(position) & position != last(position)) %>%
# summarise:
summarize(across(position),
# create average frequency:
f_middle_position = mean(f, na.rm = TRUE),
# concatenate words:
word = str_c(word, collapse = " ")
) %>%
filter(!duplicated(f_middle_position)) %>%
# join with df:
left_join(df, ., by = c("rowid", "position")) %>%
# remove rows other than #1,#2, and last:
group_by(rowid) %>%
# create row count:
mutate(rn = row_number()) %>%
# filter first, second, and last row per group:
filter(rn %in% c(1, 2, last(rn))) %>%
# transfer frequencies for middle positions:
mutate(f = ifelse(is.na(f_middle_position), f, f_middle_position)) %>%
# make more changes:
mutate(
# change position labels:
position = ifelse(position == first(position), 1,
ifelse(position == last(position), 2, 1.5)),
# update word:
word = ifelse(is.na(word.y), word.x, word.y)
) %>%
# remove obsolete variables:
select(-c(f_middle_position, word.y, word.x,rn))
A tibble: 12 × 4
# Groups: rowid [4]
rowid f position word
<dbl> <dbl> <dbl> <chr>
1 2 700 1 i
2 2 600 1.5 'm
3 2 1 2 fine
4 3 400 1 how
5 3 600 1.5 's the
6 3 20 2 weather
7 4 390 1 it
8 4 292. 1.5 's really very
9 4 35 2 cold
10 5 700 1 i
11 5 199 1.5 love
12 5 400 2 you
How can this result be obtained in a more concise way in dplyr
and, preferably without the left_join
, which causes problems with my actual data?
Data:
df <- data.frame(
rowid = c(2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
word = c("i","'m","fine",
"how","'s","the","weather",
"it","'s","really", "very","cold",
"i","love","you"),
f = c(700,600,1,
400,500,700,20,
390,500,177,200,35,
700,199,400),
position = c(1,2,3,
1,2,3,4,
1,2,3,4,5,
1,2,3)
)
答案1
得分: 1
# 创建一个名为 `pos` 的组变量,将第一行标记为 `1`,中间行标记为 `1.5`,最后一行标记为 `2`。
# 然后按照 `rowid` 和 `pos` 进行分组,对 `f` 应用 `mean()` 函数,对 `word` 应用 `paste()` 函数。
library(dplyr)
df %>%
group_by(rowid) %>%
mutate(pos = case_when(position == 1 ~ 1, position == n() ~ 2, TRUE ~ 1.5)) %>%
group_by(rowid, pos) %>%
summarise(f = mean(f), word = paste(word, collapse = ' '), .groups = 'drop')
英文:
You can create a group variable pos
that marks the first row with 1
, the middle with 1.5
, and the last with 2
. Then group the data by rowid
and pos
and apply mean()
and paste()
on f
and word
respectively.
library(dplyr)
df %>%
group_by(rowid) %>%
mutate(pos = case_when(position == 1 ~ 1, position == n() ~ 2, TRUE ~ 1.5)) %>%
group_by(rowid, pos) %>%
summarise(f = mean(f), word = paste(word, collapse = ' '), .groups = 'drop')
# # A tibble: 12 × 4
# rowid pos f word
# <dbl> <dbl> <dbl> <chr>
# 1 2 1 700 i
# 2 2 1.5 600 'm
# 3 2 2 1 fine
# 4 3 1 400 how
# 5 3 1.5 600 's the
# 6 3 2 20 weather
# 7 4 1 390 it
# 8 4 1.5 292. 's really very
# 9 4 2 35 cold
# 10 5 1 700 i
# 11 5 1.5 199 love
# 12 5 2 400 you
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论