2023年2月27日 07:13:13go评论74阅读模式

英文:

R: condense row's category and split the count

问题

在R中，我有一个数据框（data frame）：

data.frame(value1=c("apple", "orange","banana","apple,orange"), count=c(2,4,6,2))

我想要将数据框变为：

data.frame(value1=c("apple", "orange","banana"), count=c(3,5,6))

通过消除行 "apple, orange"，并将计数添加到 "apple" 和 "orange"。

我尝试使用以下方法：

df$value1 <- unlist(strsplit(as.character(df$value1), ","))

但我认为这是错误的方法...

英文:

In r, I have a data frame

data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;,&quot;apple,orange&quot;), count=c(2,4,6,2))

I want the date frame to become

data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;), count=c(3,5,6))

by eliminating the row "apple, orange", and add counts to "apple" and "orange"

I've tried to use

df$value1 &lt;- unlist(strsplit(as.character(df$value1), &quot;,&quot;))

, but I think this is the wrong approach...

Thank you!

答案1

得分: 3

代码部分不需要翻译，以下是已翻译的内容：

Similar idea as akrun's but using slightly different functions:
与akrun的想法类似，但使用略有不同的函数：

df %>%
   mutate(count = count / (1+str_count(value1, ','))) %>%
   separate_rows(value1) %>%
   count(value1, wt = count)

In base R:
在基本R中：

a <- strsplit(df$value1, ",")
b <- df$count/(nchar(gsub("[^,]", "", df$value1)) + 1)
stack(tapply(rep(b, lengths(a)), unlist(a), sum))
  values    ind
1      3  apple
2      6 banana
3      5 orange

请注意，这是关于R语言的一段代码，已按您要求进行了翻译。

英文:

Similar idea as akrun's but using slightly different functions:

df %&gt;%
   mutate(count = count / (1+str_count(value1, &#39;,&#39;)))%&gt;%
   separate_rows(value1) %&gt;%
   count(value1, wt = count)

# A tibble: 3 &#215; 2
  value1     n
  &lt;chr&gt;  &lt;dbl&gt;
1 apple      3
2 banana     6
3 orange     5

In base R:

 a &lt;- strsplit(df$value1, &quot;,&quot;)
 b &lt;- df$count/(nchar(gsub(&quot;[^,]&quot;, &quot;&quot;, df$value1)) + 1)
 stack(tapply(rep(b, lengths(a)), unlist(a), sum))
  values    ind
1      3  apple
2      6 banana
3      5 orange

答案2

得分: 2

我们可以通过将 count 值除以单词计数来重新校准，假设每个实体之间以逗号分隔，计算逗号的数量并加1，仅适用于包含逗号字符的实体，然后分隔 value1 列，进行分组求和（reframe）

library(dplyr) # 版本 >= 1.1.0
library(tidyr)
library(stringr)
df1 %>%
   mutate(count = case_when(str_detect(value1, ",") ~
      count / (str_count(value1, ",") + 1), TRUE ~ count)) %>%
   separate_longer_delim(value1, delim = regex(",\\s*")) %>%
   reframe(count = sum(count), .by = value1)

输出

  value1 count
1  apple     3
2 orange     5
3 banana     6

英文:

We could recalibrate the count values by dividing with the count of words i.e. assuming each entity is separated by comma, count the number of comma and add 1, only for those having comma character and then separate the value1 column, do a group by sum (reframe)

library(dplyr) # version &gt;= 1.1.0
library(tidyr)
library(stringr)
 df1 %&gt;% 
   mutate(count = case_when(str_detect(value1, &quot;,&quot;) ~
      count/(str_count(value1, &quot;,&quot;) + 1), TRUE ~ count)) %&gt;% 
   separate_longer_delim(value1, delim = regex(&quot;,\\s*&quot;)) %&gt;% 
   reframe(count = sum(count), .by = value1)

-output

  value1 count
1  apple     3
2 orange     5
3 banana     6

答案3

得分: 0

我实际上找到了一个愚蠢的方法...

#重新构思问题
df0 <-data.frame(value1=c("apple", "orange","banana","apple,orange", "orange,banana"), count=c(2,4,6,2,4))

#重新构思理想解决方案
df0 <-data.frame(value1=c("apple", "orange","banana"), count=c(2,4,6))

### 我的解决方案: ###

df1 <-

#选择要进行变换的行
  df0[str_detect(df0$value1,",")] %>%
  
#将值分隔成两列
  separate(col = value1, into = paste0("fruit", 1:2), sep = ",") %>%
  
#将计数除以两
  mutate(count = count/2) %>%
  
#将列转换为行
  pivot_longer(fruit1:fruit2, names_to = "longername", values_to = "value1") %>%
  
#移除不需要的列
  select(-"longername") %>%
  
#添加回具有单个值的原始列
  rbind(df0[!str_detect(df0$value1,",")]) %>%
  
#重新计算计数
  group_by(value1) %>%
  summarise(newcount = sum(count))

英文:

I actually figured out a dumb way...

#reframing the problem
df0 &lt;-data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;,&quot;apple,orange&quot;, &quot;orange,banana&quot;), count=c(2,4,6,2,4))

#reframing the ideal solution
df0 &lt;-data.frame(value1=c(&quot;apple&quot;, &quot;orange&quot;,&quot;banana&quot;), count=c(2,4,6))

### my solution: ###


df1 &lt;-
#select the rows to be mutated
  df0[str_detect(df0$value1,&quot;,&quot;),]%&gt;%
#separate the values into two columns
  separate(col = value1, into = paste0(&quot;fruit&quot;, 1:2), sep = &quot;,&quot;)%&gt;%
#divide the count by two
  mutate(count = count/2)%&gt;%
#turn columns into rows
  pivot_longer(fruit1:fruit2, names_to = &quot;longername&quot;, values_to = &quot;value1&quot;) %&gt;%
#remove the unneeded column
  select(-&quot;longername&quot;) %&gt;%
#add back the og column that has single value
  rbind(df0[!str_detect(df0$value1,&quot;,&quot;),])%&gt;%
#recalculate the count
  group_by(value1)%&gt;%
  summarise(newcount = sum(count))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R: 精简行的类别并拆分计数

问题

答案1

答案2

答案3

累积和，带有两个条件。

我如何在R中将列表中的数据框命名为它们来自的CSV文件？

显示sqrt(y) ~ assign(sqrt(x))与`stat_poly_eq`、`stat_fit_tb`或`stat_fit_tidy`之间的关系。

优化性能，同时循环遍历数据表并使用 set 函数。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论