2020年1月6日 19:49:25go评论123阅读模式

英文:

Create a column grouping strings text extracted from a column based on another column in R

问题

这是我的数据集

id   text
 1    "红色"
 1    "蓝色"
 2    "浅蓝色"
 2    "红色"
 2    "黄色"
 3    "深绿色"

这是我想要得到的结果：

 id  text2
 1   "红色, 蓝色"
 2   "浅蓝色, 红色, 黄色"
 3   "深绿色"

基本上，我需要将“text”列中的文本用逗号分隔在一起。

英文:

this is my dataset

id   text
 1    &quot;red&quot;
 1    &quot;blue&quot;
 2    &quot;light blue&quot;
 2    &quot;red&quot;
 2    &quot;yellow&quot;
 3    &quot;dark green&quot;

this is the result I want to obtain:

 id  text2
 1   &quot;red, blue&quot;
 2  &quot;light blue, red, yellow&quot;
 3  &quot;dark green&quot;

basically I need to put together the text from column 'text' with commas to separate the different elements

答案1

得分: 2

Using aggregate 和 toString。

aggregate(. ~ id, d, toString)
#   id                    text
# 1  1               red, blue
# 2  2 light blue, red, yellow
# 3  3              dark green

注意：这不适用于因子列，即如果 is.factor(d$text) 返回 TRUE，则需要稍微不同的方法。演示：

d$text <- as.factor(d$text)  # 将text列转换为因子
is.factor(d$text)
#  [1] TRUE

使用以下方法：

aggregate(. ~ id, transform(d, text=as.character(text)), toString)

数据：

d <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", "blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, -6L), class = "data.frame")

英文:

Using aggregate and toString.

aggregate(. ~ id, d, toString)
#   id                    text
# 1  1               red, blue
# 2  2 light blue, red, yellow
# 3  3              dark green

Note: This won't work with factor columns, i.e. if is.factor(d$text) yields TRUE you need a slightly different approach. Demonstration:

d$text &lt;- as.factor(d$text)  # make 
is.factor(d$text)
#  [1] TRUE

Do:

aggregate(. ~ id, transform(d, text=as.character(text)), toString)

Data:

d &lt;- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c(&quot;red&quot;, 
&quot;blue&quot;, &quot;light blue&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;dark green&quot;)), row.names = c(NA, 
-6L), class = &quot;data.frame&quot;)

答案2

得分: 1

我们可以使用 dplyr 库：

library(dplyr)
df1 %>%
    group_by(id) %>%
    summarise(text2 = toString(text))

数据

df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", 
"blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, 
-6L), class = "data.frame")

英文:

We can use dplyr

library(dplyr)
df1 %&gt;%
    group_by(id) %&gt;%
    summarise(text2 = toString(text))

###data

df1 &lt;- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c(&quot;red&quot;, 
&quot;blue&quot;, &quot;light blue&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;dark green&quot;)), row.names = c(NA, 
-6L), class = &quot;data.frame&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中，根据另一列创建一个用于分组字符串文本的列。

问题

答案1

答案2

数据

你可以使用正则表达式来提取这个子字符串。

在R中为每个组成员分配值的快速方法：

Error in as.vector(x, mode) : cannot coerce type 'closure' to vector of type 'any' — when running a nested function

为什么sed的点不匹配Latin1编码中的字符’ù’？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。