R: 精简行的类别并拆分计数

huangapple go评论58阅读模式
英文:

R: condense row's category and split the count

问题

在R中,我有一个数据框(data frame):

data.frame(value1=c("apple", "orange","banana","apple,orange"), count=c(2,4,6,2))

我想要将数据框变为:

data.frame(value1=c("apple", "orange","banana"), count=c(3,5,6))

通过消除行 "apple, orange",并将计数添加到 "apple" 和 "orange"。

我尝试使用以下方法:

df$value1 <- unlist(strsplit(as.character(df$value1), ","))

但我认为这是错误的方法...

英文:

In r, I have a data frame

data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;,&quot;apple,orange&quot;), count=c(2,4,6,2))

I want the date frame to become

data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;), count=c(3,5,6))

by eliminating the row "apple, orange", and add counts to "apple" and "orange"

I've tried to use

df$value1 &lt;- unlist(strsplit(as.character(df$value1), &quot;,&quot;))

, but I think this is the wrong approach...

Thank you!

答案1

得分: 3

代码部分不需要翻译,以下是已翻译的内容:

Similar idea as akrun's but using slightly different functions:
与akrun的想法类似,但使用略有不同的函数:

df %>%
   mutate(count = count / (1+str_count(value1, ','))) %>%
   separate_rows(value1) %>%
   count(value1, wt = count)

In base R:
在基本R中:

a <- strsplit(df$value1, ",")
b <- df$count/(nchar(gsub("[^,]", "", df$value1)) + 1)
stack(tapply(rep(b, lengths(a)), unlist(a), sum))
  values    ind
1      3  apple
2      6 banana
3      5 orange

请注意,这是关于R语言的一段代码,已按您要求进行了翻译。

英文:

Similar idea as akrun's but using slightly different functions:

df %&gt;%
   mutate(count = count / (1+str_count(value1, &#39;,&#39;)))%&gt;%
   separate_rows(value1) %&gt;%
   count(value1, wt = count)

# A tibble: 3 &#215; 2
  value1     n
  &lt;chr&gt;  &lt;dbl&gt;
1 apple      3
2 banana     6
3 orange     5

In base R:

 a &lt;- strsplit(df$value1, &quot;,&quot;)
 b &lt;- df$count/(nchar(gsub(&quot;[^,]&quot;, &quot;&quot;, df$value1)) + 1)
 stack(tapply(rep(b, lengths(a)), unlist(a), sum))
  values    ind
1      3  apple
2      6 banana
3      5 orange

答案2

得分: 2

我们可以通过将 count 值除以单词计数来重新校准,假设每个实体之间以逗号分隔,计算逗号的数量并加1,仅适用于包含逗号字符的实体,然后分隔 value1 列,进行分组求和(reframe

library(dplyr) # 版本 >= 1.1.0
library(tidyr)
library(stringr)
df1 %>%
   mutate(count = case_when(str_detect(value1, ",") ~
      count / (str_count(value1, ",") + 1), TRUE ~ count)) %>%
   separate_longer_delim(value1, delim = regex(",\\s*")) %>%
   reframe(count = sum(count), .by = value1)

输出

  value1 count
1  apple     3
2 orange     5
3 banana     6
英文:

We could recalibrate the count values by dividing with the count of words i.e. assuming each entity is separated by comma, count the number of comma and add 1, only for those having comma character and then separate the value1 column, do a group by sum (reframe)

library(dplyr) # version &gt;= 1.1.0
library(tidyr)
library(stringr)
 df1 %&gt;% 
   mutate(count = case_when(str_detect(value1, &quot;,&quot;) ~
      count/(str_count(value1, &quot;,&quot;) + 1), TRUE ~ count)) %&gt;% 
   separate_longer_delim(value1, delim = regex(&quot;,\\s*&quot;)) %&gt;% 
   reframe(count = sum(count), .by = value1)

-output

  value1 count
1  apple     3
2 orange     5
3 banana     6

答案3

得分: 0

我实际上找到了一个愚蠢的方法...

#重新构思问题
df0 <-data.frame(value1=c("apple", "orange","banana","apple,orange", "orange,banana"), count=c(2,4,6,2,4))
#重新构思理想解决方案
df0 <-data.frame(value1=c("apple", "orange","banana"), count=c(2,4,6))
### 我的解决方案: ###

df1 <-

#选择要进行变换的行
  df0[str_detect(df0$value1,",")] %>%
  
#将值分隔成两列
  separate(col = value1, into = paste0("fruit", 1:2), sep = ",") %>%
  
#将计数除以两
  mutate(count = count/2) %>%
  
#将列转换为行
  pivot_longer(fruit1:fruit2, names_to = "longername", values_to = "value1") %>%
  
#移除不需要的列
  select(-"longername") %>%
  
#添加回具有单个值的原始列
  rbind(df0[!str_detect(df0$value1,",")]) %>%
  
#重新计算计数
  group_by(value1) %>%
  summarise(newcount = sum(count))
英文:

I actually figured out a dumb way...

#reframing the problem
df0 &lt;-data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;,&quot;apple,orange&quot;, &quot;orange,banana&quot;), count=c(2,4,6,2,4))
#reframing the ideal solution
df0 &lt;-data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;), count=c(2,4,6))
### my solution: ###


df1 &lt;-
#select the rows to be mutated
  df0[str_detect(df0$value1,&quot;,&quot;),]%&gt;%
#separate the values into two columns
  separate(col = value1, into = paste0(&quot;fruit&quot;, 1:2), sep = &quot;,&quot;)%&gt;%
#divide the count by two
  mutate(count = count/2)%&gt;%
#turn columns into rows
  pivot_longer(fruit1:fruit2, names_to = &quot;longername&quot;, values_to = &quot;value1&quot;) %&gt;%
#remove the unneeded column
  select(-&quot;longername&quot;) %&gt;%
#add back the og column that has single value
  rbind(df0[!str_detect(df0$value1,&quot;,&quot;),])%&gt;%
#recalculate the count
  group_by(value1)%&gt;%
  summarise(newcount = sum(count))

huangapple
  • 本文由 发表于 2023年2月27日 07:13:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定