分开更宽的部分,第一半成为列名,第二半成为单元格值。

huangapple go评论101阅读模式
英文:

separate_wider where first half becomes column name and second half becomes cell value

问题

  1. df <- data.frame(V_1 = c("null", "name:c", "name:d", "name:a", "name:k","name:A"),
  2. V_2 = c("null", "cat:Y", "cat:Z", "cat:K", "cat:L","cat:K"))
  3. 我有一个包含多列键值对的数据框,就像上面的例子一样。
  4. 我希望将单元格的值分开,以便“键”成为新列的列名,而“值”成为单元格的值。
  5. 期望的输出:
  6. ```{r}
  7. df2 <- data.frame(name = c("null", "c", "d", "a", "k","A"),
  8. cat = c("null", "Y", "Z", "K", "L","K"))
  9. df2

请注意,对于我的真实数据框,我有几百列,因此我正在寻找一个不需要手动输入列名的解决方案,而是根据键值对的前半部分自动生成名称。

目前,我正在使用以下方式拆分键值对:

  1. df3 <- df %>%
  2. separate_wider_delim(cols = everything(),
  3. delim = ",",
  4. too_few = "align_start",
  5. names_sep = "")

但我不知道如何正确地将其转换,以便分离值的前半部分成为列名。

  1. <details>
  2. <summary>英文:</summary>
  3. ```{r}
  4. df &lt;- data.frame(V_1 = c(&quot;null&quot;, &quot;name:c&quot;, &quot;name:d&quot;, &quot;name:a&quot;, &quot;name:k&quot;,&quot;name:A&quot;),
  5. V_2 = c(&quot;null&quot;, &quot;cat:Y&quot;, &quot;cat:Z&quot;, &quot;cat:K&quot;, &quot;cat:L&quot;,&quot;cat:K&quot;))

I have a dataframe with multiple columns with key-value pairs like above.

I wish to separate the values of the cells, such that the 'key' becomes column name of a new column and 'value' becomes the value of the cell.

Expected output:

  1. df2 &lt;- data.frame(name = c(&quot;null&quot;, &quot;c&quot;, &quot;d&quot;, &quot;a&quot;, &quot;k&quot;,&quot;A&quot;),
  2. cat = c(&quot;null&quot;, &quot;Y&quot;, &quot;Z&quot;, &quot;K&quot;, &quot;L&quot;,&quot;K&quot;))
  3. df2

note that for my real dataframe I have several hundred columns, so I am looking for a solution that does not require manually typing column names but automatically generates the names based on the first half of the key:value pair.

Currently, I am splitting the key-value pairs using,

  1. df3 &lt;- df %&gt;%
  2. separate_wider_delim(cols = everything(),
  3. delim = &quot;,&quot;,
  4. too_few = &quot;align_start&quot;,
  5. names_sep = &quot;&quot;)

but I do not know how to transform it properly so that the first half of the separated values become column names.

答案1

得分: 2

  1. 使用第一行获取列名,删除冒号后面的内容。
  2. 要清理列的值,请删除冒号之前的所有内容。

names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.
:', '', x))
df

name cat

#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K

  1. 第二步也可以使用 `dplyr` 来完成 -

library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))

  1. <details>
  2. <summary>英文:</summary>
  3. You may use the first row to get the column names, drop everything after colon.
  4. To clean up the column value remove everything till the colon.

names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.
:', '', x))
df

name cat

#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K

  1. The second step can also be done using `dplyr` -

library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))

  1. </details>
  2. # 答案2
  3. **得分**: 2
  4. 在基本的R中,您可以在将所有内容粘贴在一起之后使用`read.dcf`:
  5. ```R
  6. a <- do.call(paste, c(sep="\n", collapse = "\n\n", df))
  7. read.dcf(textConnection(a), all = TRUE)

编辑后的代码如下:

  1. setNames(data.frame(sub(".*:", "", as.matrix(df))), gsub("(\\w+):.*|.", "\", df))

第一个代码块是将数据框df中的内容粘贴在一起,然后使用read.dcf来解析它。

第二个代码块使用正则表达式操作,去除每行中的冒号前的文本,并将结果放入一个新的数据框中,同时设置列名。

英文:

in Base R you could use read.dcf after pasteing all together:

  1. a &lt;- do.call(paste, c(sep=&quot;\n&quot;, collapse = &quot;\n\n&quot;, df))
  2. read.dcf(textConnection(a), all = TRUE)
  3. name cat
  4. 1 B X
  5. 2 c Y
  6. 3 d Z
  7. 4 a K
  8. 5 k L
  9. 6 A K

EDIT

  1. setNames(data.frame(sub(&quot;.*:&quot;,&quot;&quot;, as.matrix(df))),gsub(&quot;(\\w+):.*|.&quot;, &quot;\&quot;, df))
  2. name cat
  3. 1 null null
  4. 2 c Y
  5. 3 d Z
  6. 4 a K
  7. 5 k L
  8. 6 A K

答案3

得分: 1

get_col_names <- function(col){
col_split <- stringr::str_split(string = col[1], pattern = ":")
col_split[[1]][1]
}

new_cn <- sapply(df, get_col_names)
df %>%
mutate(
across(.cols = everything(),
.fns = ~gsub("^.*:", "", .x))
) %>%
set_names(nm = new_cn)
name cat
1 B X
2 c Y
3 d Z
4 a K
5 k L
6 A K

英文:

You don't always have to try to squeeze everything into a single step, two distinct steps could work just fine using more traditional tools:

  1. get_col_names &lt;- function(col){
  2. col_split &lt;- stringr::str_split(string = col[1],pattern = &quot;:&quot;)
  3. col_split[[1]][1]
  4. }
  5. new_cn &lt;- sapply(df,get_col_names)
  6. &gt; df %&gt;%
  7. + mutate(
  8. + across(.cols = everything(),
  9. + .fns = ~gsub(&quot;^.*:&quot;,&quot;&quot;,.x))
  10. + ) %&gt;%
  11. + set_names(nm = new_cn)
  12. name cat
  13. 1 B X
  14. 2 c Y
  15. 3 d Z
  16. 4 a K
  17. 5 k L
  18. 6 A K

答案4

得分: 0

这是一个整洁数据解决方案:第二部分也由 @Ronak Shah 提供:

  1. library(dplyr)
  2. library(tidyr)
  3. my_names <- df %>%
  4. filter(if_any(everything(), ~.!="null")) %>%
  5. pivot_longer(everything()) %>%
  6. separate(value, into = c("a", "b")) %>%
  7. pull(a) %>%
  8. df %>%
  9. rename_with(~unique(my_names)) %>%
  10. mutate(across(everything(), ~sub('.*:', '', .)))
  1. name cat
  2. 1 null null
  3. 2 c Y
  4. 3 d Z
  5. 4 a K
  6. 5 k L
  7. 6 A K
英文:

Here is a tidyverse solution: The second part is also provided by @Ronak Shah:

  1. library(dplyr)
  2. library(tidyr)
  3. my_names &lt;- df %&gt;%
  4. filter(if_any(everything(), ~.!=&quot;null&quot;)) %&gt;%
  5. pivot_longer(everything()) %&gt;%
  6. separate(value, into = c(&quot;a&quot;, &quot;b&quot;)) %&gt;%
  7. pull(a) %&gt;%
  8. df %&gt;%
  9. rename_with(~unique(my_names)) %&gt;%
  10. mutate(across(everything(), ~sub(&#39;.*:&#39;, &#39;&#39;, .)))
  1. name cat
  2. 1 null null
  3. 2 c Y
  4. 3 d Z
  5. 4 a K
  6. 5 k L
  7. 6 A K

huangapple
  • 本文由 发表于 2023年3月12日 08:47:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定