英文:
R: condense row's category and split the count
问题
在R中,我有一个数据框(data frame):
data.frame(value1=c("apple", "orange","banana","apple,orange"), count=c(2,4,6,2))
我想要将数据框变为:
data.frame(value1=c("apple", "orange","banana"), count=c(3,5,6))
通过消除行 "apple, orange",并将计数添加到 "apple" 和 "orange"。
我尝试使用以下方法:
df$value1 <- unlist(strsplit(as.character(df$value1), ","))
但我认为这是错误的方法...
英文:
In r, I have a data frame
data.frame(value1=c("apple", "orange","banana","apple,orange"), count=c(2,4,6,2))
I want the date frame to become
data.frame(value1=c("apple", "orange","banana"), count=c(3,5,6))
by eliminating the row "apple, orange", and add counts to "apple" and "orange"
I've tried to use
df$value1 <- unlist(strsplit(as.character(df$value1), ","))
, but I think this is the wrong approach...
Thank you!
答案1
得分: 3
代码部分不需要翻译,以下是已翻译的内容:
Similar idea as akrun's but using slightly different functions:
与akrun的想法类似,但使用略有不同的函数:
df %>%
mutate(count = count / (1+str_count(value1, ','))) %>%
separate_rows(value1) %>%
count(value1, wt = count)
In base R:
在基本R中:
a <- strsplit(df$value1, ",")
b <- df$count/(nchar(gsub("[^,]", "", df$value1)) + 1)
stack(tapply(rep(b, lengths(a)), unlist(a), sum))
values ind
1 3 apple
2 6 banana
3 5 orange
请注意,这是关于R语言的一段代码,已按您要求进行了翻译。
英文:
Similar idea as akrun's but using slightly different functions:
df %>%
mutate(count = count / (1+str_count(value1, ',')))%>%
separate_rows(value1) %>%
count(value1, wt = count)
# A tibble: 3 × 2
value1 n
<chr> <dbl>
1 apple 3
2 banana 6
3 orange 5
In base R:
a <- strsplit(df$value1, ",")
b <- df$count/(nchar(gsub("[^,]", "", df$value1)) + 1)
stack(tapply(rep(b, lengths(a)), unlist(a), sum))
values ind
1 3 apple
2 6 banana
3 5 orange
答案2
得分: 2
我们可以通过将 count
值除以单词计数来重新校准,假设每个实体之间以逗号分隔,计算逗号的数量并加1,仅适用于包含逗号字符的实体,然后分隔 value1
列,进行分组求和(reframe
)
library(dplyr) # 版本 >= 1.1.0
library(tidyr)
library(stringr)
df1 %>%
mutate(count = case_when(str_detect(value1, ",") ~
count / (str_count(value1, ",") + 1), TRUE ~ count)) %>%
separate_longer_delim(value1, delim = regex(",\\s*")) %>%
reframe(count = sum(count), .by = value1)
输出
value1 count
1 apple 3
2 orange 5
3 banana 6
英文:
We could recalibrate the count
values by dividing with the count of words i.e. assuming each entity is separated by comma, count the number of comma and add 1, only for those having comma character and then separate the value1 column, do a group by sum (reframe
)
library(dplyr) # version >= 1.1.0
library(tidyr)
library(stringr)
df1 %>%
mutate(count = case_when(str_detect(value1, ",") ~
count/(str_count(value1, ",") + 1), TRUE ~ count)) %>%
separate_longer_delim(value1, delim = regex(",\\s*")) %>%
reframe(count = sum(count), .by = value1)
-output
value1 count
1 apple 3
2 orange 5
3 banana 6
答案3
得分: 0
我实际上找到了一个愚蠢的方法...
#重新构思问题
df0 <-data.frame(value1=c("apple", "orange","banana","apple,orange", "orange,banana"), count=c(2,4,6,2,4))
#重新构思理想解决方案
df0 <-data.frame(value1=c("apple", "orange","banana"), count=c(2,4,6))
### 我的解决方案: ###
df1 <-
#选择要进行变换的行
df0[str_detect(df0$value1,",")] %>%
#将值分隔成两列
separate(col = value1, into = paste0("fruit", 1:2), sep = ",") %>%
#将计数除以两
mutate(count = count/2) %>%
#将列转换为行
pivot_longer(fruit1:fruit2, names_to = "longername", values_to = "value1") %>%
#移除不需要的列
select(-"longername") %>%
#添加回具有单个值的原始列
rbind(df0[!str_detect(df0$value1,",")]) %>%
#重新计算计数
group_by(value1) %>%
summarise(newcount = sum(count))
英文:
I actually figured out a dumb way...
#reframing the problem
df0 <-data.frame(value1=c("apple", "orange","banana","apple,orange", "orange,banana"), count=c(2,4,6,2,4))
#reframing the ideal solution
df0 <-data.frame(value1=c("apple", "orange","banana"), count=c(2,4,6))
### my solution: ###
df1 <-
#select the rows to be mutated
df0[str_detect(df0$value1,","),]%>%
#separate the values into two columns
separate(col = value1, into = paste0("fruit", 1:2), sep = ",")%>%
#divide the count by two
mutate(count = count/2)%>%
#turn columns into rows
pivot_longer(fruit1:fruit2, names_to = "longername", values_to = "value1") %>%
#remove the unneeded column
select(-"longername") %>%
#add back the og column that has single value
rbind(df0[!str_detect(df0$value1,","),])%>%
#recalculate the count
group_by(value1)%>%
summarise(newcount = sum(count))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论