分开更宽的部分,第一半成为列名,第二半成为单元格值。

huangapple go评论68阅读模式
英文:

separate_wider where first half becomes column name and second half becomes cell value

问题

df <- data.frame(V_1 = c("null", "name:c", "name:d", "name:a", "name:k","name:A"),
                 V_2 = c("null", "cat:Y", "cat:Z", "cat:K", "cat:L","cat:K"))

我有一个包含多列键值对的数据框,就像上面的例子一样。

我希望将单元格的值分开,以便“键”成为新列的列名,而“值”成为单元格的值。

期望的输出:

```{r}
df2 <- data.frame(name = c("null", "c", "d", "a", "k","A"),
                  cat = c("null", "Y", "Z", "K", "L","K"))
df2

请注意,对于我的真实数据框,我有几百列,因此我正在寻找一个不需要手动输入列名的解决方案,而是根据键值对的前半部分自动生成名称。

目前,我正在使用以下方式拆分键值对:

df3 <- df %>%
  separate_wider_delim(cols = everything(),
                       delim = ",",
                       too_few = "align_start",
                       names_sep = "")

但我不知道如何正确地将其转换,以便分离值的前半部分成为列名。


<details>
<summary>英文:</summary>

```{r}
df &lt;- data.frame(V_1 = c(&quot;null&quot;, &quot;name:c&quot;, &quot;name:d&quot;, &quot;name:a&quot;, &quot;name:k&quot;,&quot;name:A&quot;),
                 V_2 = c(&quot;null&quot;, &quot;cat:Y&quot;, &quot;cat:Z&quot;, &quot;cat:K&quot;, &quot;cat:L&quot;,&quot;cat:K&quot;))

I have a dataframe with multiple columns with key-value pairs like above.

I wish to separate the values of the cells, such that the 'key' becomes column name of a new column and 'value' becomes the value of the cell.

Expected output:

df2 &lt;- data.frame(name = c(&quot;null&quot;, &quot;c&quot;, &quot;d&quot;, &quot;a&quot;, &quot;k&quot;,&quot;A&quot;),
                  cat = c(&quot;null&quot;, &quot;Y&quot;, &quot;Z&quot;, &quot;K&quot;, &quot;L&quot;,&quot;K&quot;))
df2

note that for my real dataframe I have several hundred columns, so I am looking for a solution that does not require manually typing column names but automatically generates the names based on the first half of the key:value pair.

Currently, I am splitting the key-value pairs using,

df3 &lt;- df %&gt;%
  separate_wider_delim(cols = everything(),
                       delim = &quot;,&quot;,
                       too_few = &quot;align_start&quot;,
                       names_sep = &quot;&quot;)

but I do not know how to transform it properly so that the first half of the separated values become column names.

答案1

得分: 2

使用第一行获取列名,删除冒号后面的内容。

要清理列的值,请删除冒号之前的所有内容。

names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.
:', '', x))
df

name cat

#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K


第二步也可以使用 `dplyr` 来完成 -

library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))


<details>
<summary>英文:</summary>

You may use the first row to get the column names, drop everything after colon. 

To clean up the column value remove everything till the colon. 

names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.
:', '', x))
df

name cat

#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K


The second step can also be done using `dplyr` - 

library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))


</details>



# 答案2
**得分**: 2

在基本的R中,您可以在将所有内容粘贴在一起之后使用`read.dcf`:

```R
a <- do.call(paste, c(sep="\n", collapse = "\n\n", df))
read.dcf(textConnection(a), all = TRUE)

编辑后的代码如下:

setNames(data.frame(sub(".*:", "", as.matrix(df))), gsub("(\\w+):.*|.", "\", df))

第一个代码块是将数据框df中的内容粘贴在一起,然后使用read.dcf来解析它。

第二个代码块使用正则表达式操作,去除每行中的冒号前的文本,并将结果放入一个新的数据框中,同时设置列名。

英文:

in Base R you could use read.dcf after pasteing all together:

a &lt;- do.call(paste, c(sep=&quot;\n&quot;, collapse = &quot;\n\n&quot;, df))
read.dcf(textConnection(a), all = TRUE)

  name cat
1    B   X
2    c   Y
3    d   Z
4    a   K
5    k   L
6    A   K

EDIT

setNames(data.frame(sub(&quot;.*:&quot;,&quot;&quot;, as.matrix(df))),gsub(&quot;(\\w+):.*|.&quot;, &quot;\&quot;, df))
  name  cat
1 null null
2    c    Y
3    d    Z
4    a    K
5    k    L
6    A    K

答案3

得分: 1

get_col_names <- function(col){
col_split <- stringr::str_split(string = col[1], pattern = ":")
col_split[[1]][1]
}

new_cn <- sapply(df, get_col_names)
df %>%
mutate(
across(.cols = everything(),
.fns = ~gsub("^.*:", "", .x))
) %>%
set_names(nm = new_cn)
name cat
1 B X
2 c Y
3 d Z
4 a K
5 k L
6 A K

英文:

You don't always have to try to squeeze everything into a single step, two distinct steps could work just fine using more traditional tools:

get_col_names &lt;- function(col){
  col_split &lt;- stringr::str_split(string = col[1],pattern = &quot;:&quot;)
  col_split[[1]][1]
}

new_cn &lt;- sapply(df,get_col_names)
&gt; df %&gt;%
+   mutate(
+     across(.cols = everything(),
+            .fns = ~gsub(&quot;^.*:&quot;,&quot;&quot;,.x))
+     ) %&gt;%
+   set_names(nm = new_cn)
  name cat
1    B   X
2    c   Y
3    d   Z
4    a   K
5    k   L
6    A   K

答案4

得分: 0

这是一个整洁数据解决方案:第二部分也由 @Ronak Shah 提供:

library(dplyr)
library(tidyr)

my_names <- df %>%
  filter(if_any(everything(), ~.!="null")) %>%
  pivot_longer(everything()) %>%
  separate(value, into = c("a", "b")) %>%
  pull(a) %>%

df %>%
  rename_with(~unique(my_names)) %>%
  mutate(across(everything(), ~sub('.*:', '', .)))
 name  cat
1 null null
2    c    Y
3    d    Z
4    a    K
5    k    L
6    A    K
英文:

Here is a tidyverse solution: The second part is also provided by @Ronak Shah:

library(dplyr)
library(tidyr)

my_names &lt;- df %&gt;% 
  filter(if_any(everything(), ~.!=&quot;null&quot;)) %&gt;% 
  pivot_longer(everything()) %&gt;% 
  separate(value, into = c(&quot;a&quot;, &quot;b&quot;)) %&gt;% 
  pull(a) %&gt;% 


df %&gt;% 
  rename_with(~unique(my_names)) %&gt;% 
  mutate(across(everything(), ~sub(&#39;.*:&#39;, &#39;&#39;, .)))
 name  cat
1 null null
2    c    Y
3    d    Z
4    a    K
5    k    L
6    A    K

huangapple
  • 本文由 发表于 2023年3月12日 08:47:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定