2020年1月3日 19:58:13go评论118阅读模式

英文:

R: Merge information from 2 columns together

问题

我已创建了一个示例数据框，其中包含3个不同的组，每个组有2列。

Group_1显示参与者的总数，Group_1_Pos显示总参与者中有多少人是积极的，依此类推：

df1 &lt;- structure(list(Date = c("2016", "2017", "2018", "2019"), 
                       Group_1 = c("100", "200", "300", "400"), 
                       Group_1_Pos = c("10", "20", "30", "40"),
                       Group_2 = c("500", "600", "700", "800"),
                       Group_2_Pos = c("50", "60", "70", "80"), 
                       Group_3 = c("900", "1000", "1100", "1200"),
                       Group_3_Pos = c("90", "100", "110", "120")), 
                  class = "data.frame", row.names=c("1", "2", "3", "4"))
> df1
  Date Group_1 Group_1_Pos Group_2 Group_2_Pos Group_3 Group_3_Pos
1 2016     100          10     500          50     900          90
2 2017     200          20     600          60    1000         100
3 2018     300          30     700          70    1100         110
4 2019     400          40     800          80    1200         120

我想将总参与者列与积极参与者列合并，以保持两个值仍然用括号分开。例如：

  Date      Group_1    Group_2     Group_3 
1 2016     100 (10)    500 (50)    900 (90)          
2 2017     200 (20)    600 (60)  1000 (100)        
3 2018     300 (30)    700 (70)  1100 (110)       
4 2019     400 (40)    800 (80)  1200 (120)

因此，在这个示例中，我将积极的参与者添加到总参与者旁边，并仅保留3列用于3个组。

英文:

I have created an example dataframe which has 3 different groups with 2 columns for each group.

Group_1 shows the total amount of participants and Group_1_Pos shows how many of the total participants are positive, etc:

df1 &lt;- structure(list(Date = c(&quot;2016&quot;, &quot;2017&quot;, &quot;2018&quot;, &quot;2019&quot;), 
                       Group_1 = c(&quot;100&quot;, &quot;200&quot;, &quot;300&quot;, &quot;400&quot;), 
                       Group_1_Pos = c(&quot;10&quot;, &quot;20&quot;, &quot;30&quot;, &quot;40&quot;),
                       Group_2 = c(&quot;500&quot;, &quot;600&quot;, &quot;700&quot;, &quot;800&quot;),
                       Group_2_Pos = c(&quot;50&quot;, &quot;60&quot;, &quot;70&quot;, &quot;80&quot;), 
                       Group_3 = c(&quot;900&quot;, &quot;1000&quot;, &quot;1100&quot;, &quot;1200&quot;),
                       Group_3_Pos = c(&quot;90&quot;, &quot;100&quot;, &quot;110&quot;, &quot;120&quot;)), 
                  class = &quot;data.frame&quot;, row.names=c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;))
&gt; df1
  Date Group_1 Group_1_Pos Group_2 Group_2_Pos Group_3 Group_3_Pos
1 2016     100          10     500          50     900          90
2 2017     200          20     600          60    1000         100
3 2018     300          30     700          70    1100         110
4 2019     400          40     800          80    1200         120

I would like to combine the total participant columns together with the positive participant columns in a way that keeps both values still seperated with brackets. As an example:

  Date      Group_1    Group_2     Group_3 
1 2016     100 (10)    500 (50)    900 (90)          
2 2017     200 (20)    600 (60)  1000 (100)        
3 2018     300 (30)    700 (70)  1100 (110)       
4 2019     400 (40)    800 (80)  1200 (120)

So in this example I add the positive participants in () brackets next to the total participants and only keep 3 columns for the 3 groups.

Any help would be appreciated.

答案1

得分: 1

使用 dplyr，您可以尝试以下方式：

library(dplyr)
df1 %>%
  mutate(Group_1 = paste0(Group_1, " (", Group_1_Pos, ")"),
         Group_2 = paste0(Group_2, " (", Group_2_Pos, ")"),
         Group_3 = paste0(Group_3, " (", Group_3_Pos, ")"),) %>%
  select(-contains("Pos"))
#   Date  Group_1  Group_2    Group_3
# 1 2016 100 (10) 500 (50)   900 (90)
# 2 2017 200 (20) 600 (60) 1000 (100)
# 3 2018 300 (30) 700 (70) 1100 (110)
# 4 2019 400 (40) 800 (80) 1200 (120)

英文:

Using dplyr you could go for something like:

library(dplyr)
df1 %&gt;%
  mutate(Group_1 = paste0(Group_1, &quot; (&quot;, Group_1_Pos, &quot;)&quot;),
         Group_2 = paste0(Group_2, &quot; (&quot;, Group_2_Pos, &quot;)&quot;),
         Group_3 = paste0(Group_3, &quot; (&quot;, Group_3_Pos, &quot;)&quot;),) %&gt;% 
  select(-contains(&quot;Pos&quot;))
#   Date  Group_1  Group_2    Group_3
# 1 2016 100 (10) 500 (50)   900 (90)
# 2 2017 200 (20) 600 (60) 1000 (100)
# 3 2018 300 (30) 700 (70) 1100 (110)
# 4 2019 400 (40) 800 (80) 1200 (120)

答案2

得分: 1

A purrr-dplyr-stringr:

other_values <- df1[, seq(1, ncol(df1), 2)]
df1 %>%
  select(-contains("Pos")) %>%
  purrr::map2_df(., other_values, function(x, y) paste0(x, " (", y, ")")) %>%
  mutate(Date = stringr::str_remove_all(Date, "\\s.*"))

A tibble: 4 x 4

Date Group_1 Group_2 Group_3

1 2016 100 (10) 500 (50) 900 (90)
2 2017 200 (20) 600 (60) 1000 (100)
3 2018 300 (30) 700 (70) 1100 (110)
4 2019 400 (40) 800 (80) 1200 (120)


<details>
<summary>英文:</summary>
A `purrr`-`dplyr`-`stringr`:
        other_values &lt;- df1[,seq(1,ncol(df1),2)]
           df1 %&gt;% 
       select(-contains(&quot;Pos&quot;)) %&gt;% 
       purrr::map2_df(.,other_values, 
                      function(x,y) paste0(x,&quot; (&quot;,y,&quot;)&quot;)) %&gt;% 
     
       mutate(Date=stringr::str_remove_all(Date,&quot;\\s.*&quot;))
    # A tibble: 4 x 4
      Date  Group_1  Group_2  Group_3   
      &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;     
    1 2016  100 (10) 500 (50) 900 (90)  
    2 2017  200 (20) 600 (60) 1000 (100)
    3 2018  300 (30) 700 (70) 1100 (110)
    4 2019  400 (40) 800 (80) 1200 (120)
</details>
# 答案3
**得分**: 1
以下是翻译好的代码部分：
```R
这是一个使用基本R方式来完成问题要求的方法。
使用正则表达式和`grep`获取要粘贴的列，然后遍历索引向量并将它们粘贴在一起。最后，使用`cbind`将第一列和这个结果合并。
inx <- grep("\\d$", names(df1))
tmp <- sapply(inx, function(i) paste(df1[[i]], paste0("(", df1[[i + 1]], ")")))
res <- cbind(df1[1], tmp)
names(res)[-1] <- names(df1)[inx]
res
#  Date  Group_1  Group_2    Group_3
#1 2016 100 (10) 500 (50)   900 (90)
#2 2017 200 (20) 600 (60) 1000 (100)
#3 2018 300 (30) 700 (70) 1100 (110)
#4 2019 400 (40) 800 (80) 1200 (120)
最后清理。
rm(inx, tmp)

英文:

Here is a base R way of doing what the question asks for.
Get the columns to be pasted with a regex and grep, then loop through the indices vector and paste them together. Finally, cbind the first column and this result.

inx &lt;- grep(&quot;\\d$&quot;, names(df1))
tmp &lt;- sapply(inx, function(i) paste(df1[[i]], paste0(&quot;(&quot;, df1[[i + 1]], &quot;)&quot;)))
res &lt;- cbind(df1[1], tmp)
names(res)[-1] &lt;- names(df1)[inx]
res
#  Date  Group_1  Group_2    Group_3
#1 2016 100 (10) 500 (50)   900 (90)
#2 2017 200 (20) 600 (60) 1000 (100)
#3 2018 300 (30) 700 (70) 1100 (110)
#4 2019 400 (40) 800 (80) 1200 (120)

Final clean up.

rm(inx, tmp)

答案4

得分: 1

给定3个组，这是一个基于R语言的解决方案，可以为您提供所需的输出：

n <- 3
dfout <- cbind(df1[1],
               `colnames<-`(sapply(seq(n), function(k) paste0(df[[x <- paste0("Group_",k)]], " (", df[[paste0(x,"_Pos")]], ")")),
                            paste0("Group", seq(n))))

结果如下：

> dfout
  Date   Group1   Group2     Group3
1 2016 100 (10) 500 (50)   900 (90)
2 2017 200 (20) 600 (60) 1000 (100)
3 2018 300 (30) 700 (70) 1100 (110)
4 2019 400 (40) 800 (80) 1200 (120)

如果您有任何其他问题，请随时提出。

英文:

Given 3 groups, here is a base R solution that can give you the desired output

n &lt;- 3
dfout &lt;- cbind(df1[1],
               `colnames&lt;-`(sapply(seq(n), function(k) paste0(df[[x &lt;- paste0(&quot;Group_&quot;,k)]],&quot; (&quot;, df[[paste0(x,&quot;_Pos&quot;)]],&quot;)&quot;)),
                            paste0(&quot;Group&quot;,seq(n))))

such that

&gt; dfout
  Date   Group1   Group2     Group3
1 2016 100 (10) 500 (50)   900 (90)
2 2017 200 (20) 600 (60) 1000 (100)
3 2018 300 (30) 700 (70) 1100 (110)
4 2019 400 (40) 800 (80) 1200 (120)

答案5

得分: 0

这是一个更通用的tidyverse解决方案

library(tidyverse)
df1 %>%
  rename_at(
    vars(contains("Pos")),
    ~ str_remove(., "_Pos") %>%
    str_remove("Group_") %>%
    str_c("Pos", ., sep = "_")
  ) %>%
  pivot_longer(Group_1:Pos_3,
               names_to = c(".value", "set"),
               names_sep = "_") %>%
  mutate(Pos = Pos %>%
           str_c("(", ., ")")) %>%
  unite("result", Group:Pos, sep = "") %>%
  pivot_wider(names_from = set, values_from = result)

请注意，这是R代码的翻译。

英文:

Here is a more general tidyverse solution

library(tidyverse)
df1 %&gt;%
      rename_at(
        vars(contains(&quot;Pos&quot;)),
        ~ str_remove(., &quot;_Pos&quot;) %&gt;% str_remove(&quot;Group_&quot;) %&gt;% str_c(&quot;Pos&quot;, ., sep = &quot;_&quot;)
      ) %&gt;%
      pivot_longer(Group_1:Pos_3,
                   names_to = c(&quot;.value&quot;, &quot;set&quot;),
                   names_sep = &quot;_&quot;) %&gt;%
      mutate(Pos = Pos %&gt;% str_c(&quot;(&quot;, ., &quot;)&quot;)) %&gt;%
      unite(&quot;result&quot;, Group:Pos, sep = &quot;&quot;) %&gt;%
      pivot_wider(names_from = set, values_from = result)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两列的信息合并在一起

问题

答案1

答案2

A tibble: 4 x 4

答案4

答案5

R: 根据另一列对应行打印列中的一个值

如何从分段包中删除自动断点/系数？

How can I extract only USD values from a column in R data table including salaries in crore?

在R中用几行代码创建多个图表。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论