在R中,根据另一列创建一个用于分组字符串文本的列。

huangapple go评论123阅读模式
英文:

Create a column grouping strings text extracted from a column based on another column in R

问题

这是我的数据集

  1. id text
  2. 1 "红色"
  3. 1 "蓝色"
  4. 2 "浅蓝色"
  5. 2 "红色"
  6. 2 "黄色"
  7. 3 "深绿色"

这是我想要得到的结果:

  1. id text2
  2. 1 "红色, 蓝色"
  3. 2 "浅蓝色, 红色, 黄色"
  4. 3 "深绿色"

基本上,我需要将“text”列中的文本用逗号分隔在一起。

英文:

this is my dataset

  1. id text
  2. 1 "red"
  3. 1 "blue"
  4. 2 "light blue"
  5. 2 "red"
  6. 2 "yellow"
  7. 3 "dark green"

this is the result I want to obtain:

  1. id text2
  2. 1 "red, blue"
  3. 2 "light blue, red, yellow"
  4. 3 "dark green"

basically I need to put together the text from column 'text' with commas to separate the different elements

答案1

得分: 2

Using aggregatetoString

  1. aggregate(. ~ id, d, toString)
  2. # id text
  3. # 1 1 red, blue
  4. # 2 2 light blue, red, yellow
  5. # 3 3 dark green

注意:这不适用于因子列,即如果 is.factor(d$text) 返回 TRUE,则需要稍微不同的方法。演示:

  1. d$text <- as.factor(d$text) # 将text列转换为因子
  2. is.factor(d$text)
  3. # [1] TRUE

使用以下方法:

  1. aggregate(. ~ id, transform(d, text=as.character(text)), toString)

数据:

  1. d <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", "blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, -6L), class = "data.frame")
英文:

Using aggregate and toString.

  1. aggregate(. ~ id, d, toString)
  2. # id text
  3. # 1 1 red, blue
  4. # 2 2 light blue, red, yellow
  5. # 3 3 dark green

Note: This won't work with factor columns, i.e. if is.factor(d$text) yields TRUE you need a slightly different approach. Demonstration:

  1. d$text &lt;- as.factor(d$text) # make
  2. is.factor(d$text)
  3. # [1] TRUE

Do:

  1. aggregate(. ~ id, transform(d, text=as.character(text)), toString)

Data:

  1. d &lt;- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c(&quot;red&quot;,
  2. &quot;blue&quot;, &quot;light blue&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;dark green&quot;)), row.names = c(NA,
  3. -6L), class = &quot;data.frame&quot;)

答案2

得分: 1

我们可以使用 dplyr 库:

  1. library(dplyr)
  2. df1 %>%
  3. group_by(id) %>%
  4. summarise(text2 = toString(text))

数据

  1. df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red",
  2. "blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA,
  3. -6L), class = "data.frame")
英文:

We can use dplyr

  1. library(dplyr)
  2. df1 %&gt;%
  3. group_by(id) %&gt;%
  4. summarise(text2 = toString(text))

###data

  1. df1 &lt;- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c(&quot;red&quot;,
  2. &quot;blue&quot;, &quot;light blue&quot;, &quot;red&quot;, &quot;yellow&quot;, &quot;dark green&quot;)), row.names = c(NA,
  3. -6L), class = &quot;data.frame&quot;)

huangapple
  • 本文由 发表于 2020年1月6日 19:49:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/59611600.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定