2023年3月12日 08:47:07go评论101阅读模式

英文:

separate_wider where first half becomes column name and second half becomes cell value

问题

df &lt;- data.frame(V_1 = c(&quot;null&quot;, &quot;name:c&quot;, &quot;name:d&quot;, &quot;name:a&quot;, &quot;name:k&quot;,&quot;name:A&quot;),
                 V_2 = c(&quot;null&quot;, &quot;cat:Y&quot;, &quot;cat:Z&quot;, &quot;cat:K&quot;, &quot;cat:L&quot;,&quot;cat:K&quot;))
我有一个包含多列键值对的数据框，就像上面的例子一样。
我希望将单元格的值分开，以便“键”成为新列的列名，而“值”成为单元格的值。
期望的输出：
```{r}
df2 &lt;- data.frame(name = c(&quot;null&quot;, &quot;c&quot;, &quot;d&quot;, &quot;a&quot;, &quot;k&quot;,&quot;A&quot;),
                  cat = c(&quot;null&quot;, &quot;Y&quot;, &quot;Z&quot;, &quot;K&quot;, &quot;L&quot;,&quot;K&quot;))
df2

请注意，对于我的真实数据框，我有几百列，因此我正在寻找一个不需要手动输入列名的解决方案，而是根据键值对的前半部分自动生成名称。

目前，我正在使用以下方式拆分键值对：

df3 &lt;- df %&gt;%
  separate_wider_delim(cols = everything(),
                       delim = &quot;,&quot;,
                       too_few = &quot;align_start&quot;,
                       names_sep = &quot;&quot;)

但我不知道如何正确地将其转换，以便分离值的前半部分成为列名。


<details>
<summary>英文:</summary>
```{r}
df &lt;- data.frame(V_1 = c(&quot;null&quot;, &quot;name:c&quot;, &quot;name:d&quot;, &quot;name:a&quot;, &quot;name:k&quot;,&quot;name:A&quot;),
                 V_2 = c(&quot;null&quot;, &quot;cat:Y&quot;, &quot;cat:Z&quot;, &quot;cat:K&quot;, &quot;cat:L&quot;,&quot;cat:K&quot;))

I have a dataframe with multiple columns with key-value pairs like above.

I wish to separate the values of the cells, such that the 'key' becomes column name of a new column and 'value' becomes the value of the cell.

Expected output:

df2 &lt;- data.frame(name = c(&quot;null&quot;, &quot;c&quot;, &quot;d&quot;, &quot;a&quot;, &quot;k&quot;,&quot;A&quot;),
                  cat = c(&quot;null&quot;, &quot;Y&quot;, &quot;Z&quot;, &quot;K&quot;, &quot;L&quot;,&quot;K&quot;))
df2

note that for my real dataframe I have several hundred columns, so I am looking for a solution that does not require manually typing column names but automatically generates the names based on the first half of the key:value pair.

Currently, I am splitting the key-value pairs using,

df3 &lt;- df %&gt;%
  separate_wider_delim(cols = everything(),
                       delim = &quot;,&quot;,
                       too_few = &quot;align_start&quot;,
                       names_sep = &quot;&quot;)

but I do not know how to transform it properly so that the first half of the separated values become column names.

答案1

得分: 2

使用第一行获取列名，删除冒号后面的内容。
要清理列的值，请删除冒号之前的所有内容。

names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.:', '', x))
df

name cat

#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K


第二步也可以使用 `dplyr` 来完成 -

library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))


<details>
<summary>英文:</summary>
You may use the first row to get the column names, drop everything after colon. 
To clean up the column value remove everything till the colon.

names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.:', '', x))
df

name cat

#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K


The second step can also be done using `dplyr` -

library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))


</details>
# 答案2
**得分**: 2
在基本的R中，您可以在将所有内容粘贴在一起之后使用`read.dcf`：
```R
a <- do.call(paste, c(sep="\n", collapse = "\n\n", df))
read.dcf(textConnection(a), all = TRUE)

编辑后的代码如下：

setNames(data.frame(sub(".*:", "", as.matrix(df))), gsub("(\\w+):.*|.", "\", df))

第一个代码块是将数据框df中的内容粘贴在一起，然后使用read.dcf来解析它。

第二个代码块使用正则表达式操作，去除每行中的冒号前的文本，并将结果放入一个新的数据框中，同时设置列名。

英文:

in Base R you could use read.dcf after pasteing all together:

a &lt;- do.call(paste, c(sep=&quot;\n&quot;, collapse = &quot;\n\n&quot;, df))
read.dcf(textConnection(a), all = TRUE)
  name cat
1    B   X
2    c   Y
3    d   Z
4    a   K
5    k   L
6    A   K

EDIT

setNames(data.frame(sub(&quot;.*:&quot;,&quot;&quot;, as.matrix(df))),gsub(&quot;(\\w+):.*|.&quot;, &quot;\&quot;, df))
  name  cat
1 null null
2    c    Y
3    d    Z
4    a    K
5    k    L
6    A    K

答案3

得分: 1

get_col_names <- function(col){
col_split <- stringr::str_split(string = col[1], pattern = ":")
col_split[[1]][1]
}

new_cn <- sapply(df, get_col_names)
df %>%
mutate(
across(.cols = everything(),
.fns = ~gsub("^.*:", "", .x))
) %>%
set_names(nm = new_cn)
name cat
1 B X
2 c Y
3 d Z
4 a K
5 k L
6 A K

英文:

You don't always have to try to squeeze everything into a single step, two distinct steps could work just fine using more traditional tools:

get_col_names &lt;- function(col){
  col_split &lt;- stringr::str_split(string = col[1],pattern = &quot;:&quot;)
  col_split[[1]][1]
}
new_cn &lt;- sapply(df,get_col_names)
&gt; df %&gt;%
+   mutate(
+     across(.cols = everything(),
+            .fns = ~gsub(&quot;^.*:&quot;,&quot;&quot;,.x))
+     ) %&gt;%
+   set_names(nm = new_cn)
  name cat
1    B   X
2    c   Y
3    d   Z
4    a   K
5    k   L
6    A   K

答案4

得分: 0

这是一个整洁数据解决方案：第二部分也由 @Ronak Shah 提供：

library(dplyr)
library(tidyr)
my_names <- df %>%
  filter(if_any(everything(), ~.!="null")) %>%
  pivot_longer(everything()) %>%
  separate(value, into = c("a", "b")) %>%
  pull(a) %>%
df %>%
  rename_with(~unique(my_names)) %>%
  mutate(across(everything(), ~sub('.*:', '', .)))

 name  cat
1 null null
2    c    Y
3    d    Z
4    a    K
5    k    L
6    A    K

英文:

Here is a tidyverse solution: The second part is also provided by @Ronak Shah:

library(dplyr)
library(tidyr)
my_names &lt;- df %&gt;% 
  filter(if_any(everything(), ~.!=&quot;null&quot;)) %&gt;% 
  pivot_longer(everything()) %&gt;% 
  separate(value, into = c(&quot;a&quot;, &quot;b&quot;)) %&gt;% 
  pull(a) %&gt;% 
df %&gt;% 
  rename_with(~unique(my_names)) %&gt;% 
  mutate(across(everything(), ~sub(&#39;.*:&#39;, &#39;&#39;, .)))

 name  cat
1 null null
2    c    Y
3    d    Z
4    a    K
5    k    L
6    A    K

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

分开更宽的部分，第一半成为列名，第二半成为单元格值。

问题

答案1

name cat

name cat

答案3

答案4

How can I remove rows of a dataframe that contain two specific characters?

在图表中添加线连接不同的集群。

如何使用组确定填充和X变量确定颜色来创建堆叠条形图（ggplot2）？

如何提高shiny中dygraph的刷新率？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论