在R中,根据另一列创建一个用于分组字符串文本的列。

huangapple go评论91阅读模式
英文:

Create a column grouping strings text extracted from a column based on another column in R

问题

这是我的数据集

id   text
 1    "红色"
 1    "蓝色"
 2    "浅蓝色"
 2    "红色"
 2    "黄色"
 3    "深绿色"

这是我想要得到的结果:

 id  text2
 1   "红色, 蓝色"
 2   "浅蓝色, 红色, 黄色"
 3   "深绿色"

基本上,我需要将“text”列中的文本用逗号分隔在一起。

英文:

this is my dataset

id   text
 1    "red"
 1    "blue"
 2    "light blue"
 2    "red"
 2    "yellow"
 3    "dark green"

this is the result I want to obtain:

 id  text2
 1   "red, blue"
 2  "light blue, red, yellow"
 3  "dark green"

basically I need to put together the text from column 'text' with commas to separate the different elements

答案1

得分: 2

Using aggregatetoString

aggregate(. ~ id, d, toString)
#   id                    text
# 1  1               red, blue
# 2  2 light blue, red, yellow
# 3  3              dark green

注意:这不适用于因子列,即如果 is.factor(d$text) 返回 TRUE,则需要稍微不同的方法。演示:

d$text <- as.factor(d$text)  # 将text列转换为因子
is.factor(d$text)
#  [1] TRUE

使用以下方法:

aggregate(. ~ id, transform(d, text=as.character(text)), toString)

数据:

d <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", "blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, -6L), class = "data.frame")
英文:

Using aggregate and toString.

aggregate(. ~ id, d, toString)
#   id                    text
# 1  1               red, blue
# 2  2 light blue, red, yellow
# 3  3              dark green

Note: This won't work with factor columns, i.e. if is.factor(d$text) yields TRUE you need a slightly different approach. Demonstration:

d$text &lt;- as.factor(d$text)  # make 
is.factor(d$text)
#  [1] TRUE

Do:

aggregate(. ~ id, transform(d, text=as.character(text)), toString)

Data:

d &lt;- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c(&quot;red&quot;, 
&quot;blue&quot;, &quot;light blue&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;dark green&quot;)), row.names = c(NA, 
-6L), class = &quot;data.frame&quot;)

答案2

得分: 1

我们可以使用 dplyr 库:

library(dplyr)
df1 %>%
    group_by(id) %>%
    summarise(text2 = toString(text))

数据

df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", 
"blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, 
-6L), class = "data.frame")
英文:

We can use dplyr

library(dplyr)
df1 %&gt;%
    group_by(id) %&gt;%
    summarise(text2 = toString(text))

###data

df1 &lt;- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c(&quot;red&quot;, 
&quot;blue&quot;, &quot;light blue&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;dark green&quot;)), row.names = c(NA, 
-6L), class = &quot;data.frame&quot;)

huangapple
  • 本文由 发表于 2020年1月6日 19:49:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/59611600.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定