英文:
separate_wider where first half becomes column name and second half becomes cell value
问题
df <- data.frame(V_1 = c("null", "name:c", "name:d", "name:a", "name:k","name:A"),
V_2 = c("null", "cat:Y", "cat:Z", "cat:K", "cat:L","cat:K"))
我有一个包含多列键值对的数据框,就像上面的例子一样。
我希望将单元格的值分开,以便“键”成为新列的列名,而“值”成为单元格的值。
期望的输出:
```{r}
df2 <- data.frame(name = c("null", "c", "d", "a", "k","A"),
cat = c("null", "Y", "Z", "K", "L","K"))
df2
请注意,对于我的真实数据框,我有几百列,因此我正在寻找一个不需要手动输入列名的解决方案,而是根据键值对的前半部分自动生成名称。
目前,我正在使用以下方式拆分键值对:
df3 <- df %>%
separate_wider_delim(cols = everything(),
delim = ",",
too_few = "align_start",
names_sep = "")
但我不知道如何正确地将其转换,以便分离值的前半部分成为列名。
<details>
<summary>英文:</summary>
```{r}
df <- data.frame(V_1 = c("null", "name:c", "name:d", "name:a", "name:k","name:A"),
V_2 = c("null", "cat:Y", "cat:Z", "cat:K", "cat:L","cat:K"))
I have a dataframe with multiple columns with key-value pairs like above.
I wish to separate the values of the cells, such that the 'key' becomes column name of a new column and 'value' becomes the value of the cell.
Expected output:
df2 <- data.frame(name = c("null", "c", "d", "a", "k","A"),
cat = c("null", "Y", "Z", "K", "L","K"))
df2
note that for my real dataframe I have several hundred columns, so I am looking for a solution that does not require manually typing column names but automatically generates the names based on the first half of the key:value pair.
Currently, I am splitting the key-value pairs using,
df3 <- df %>%
separate_wider_delim(cols = everything(),
delim = ",",
too_few = "align_start",
names_sep = "")
but I do not know how to transform it properly so that the first half of the separated values become column names.
答案1
得分: 2
使用第一行获取列名,删除冒号后面的内容。
要清理列的值,请删除冒号之前的所有内容。
names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.:', '', x))
df
name cat
#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K
第二步也可以使用 `dplyr` 来完成 -
library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))
<details>
<summary>英文:</summary>
You may use the first row to get the column names, drop everything after colon.
To clean up the column value remove everything till the colon.
names(df) <- sub(':.', '', unlist(df[1,]))
df[] <- lapply(df, function(x) sub('.:', '', x))
df
name cat
#1 B X
#2 c Y
#3 d Z
#4 a K
#5 k L
#6 A K
The second step can also be done using `dplyr` -
library(dplyr)
df <- df %>% mutate(across(everything(), ~sub('.*:', '', .)))
</details>
# 答案2
**得分**: 2
在基本的R中,您可以在将所有内容粘贴在一起之后使用`read.dcf`:
```R
a <- do.call(paste, c(sep="\n", collapse = "\n\n", df))
read.dcf(textConnection(a), all = TRUE)
编辑后的代码如下:
setNames(data.frame(sub(".*:", "", as.matrix(df))), gsub("(\\w+):.*|.", "\", df))
第一个代码块是将数据框df
中的内容粘贴在一起,然后使用read.dcf
来解析它。
第二个代码块使用正则表达式操作,去除每行中的冒号前的文本,并将结果放入一个新的数据框中,同时设置列名。
英文:
in Base R you could use read.dcf
after pasteing all together:
a <- do.call(paste, c(sep="\n", collapse = "\n\n", df))
read.dcf(textConnection(a), all = TRUE)
name cat
1 B X
2 c Y
3 d Z
4 a K
5 k L
6 A K
EDIT
setNames(data.frame(sub(".*:","", as.matrix(df))),gsub("(\\w+):.*|.", "\", df))
name cat
1 null null
2 c Y
3 d Z
4 a K
5 k L
6 A K
答案3
得分: 1
get_col_names <- function(col){
col_split <- stringr::str_split(string = col[1], pattern = ":")
col_split[[1]][1]
}
new_cn <- sapply(df, get_col_names)
df %>%
mutate(
across(.cols = everything(),
.fns = ~gsub("^.*:", "", .x))
) %>%
set_names(nm = new_cn)
name cat
1 B X
2 c Y
3 d Z
4 a K
5 k L
6 A K
英文:
You don't always have to try to squeeze everything into a single step, two distinct steps could work just fine using more traditional tools:
get_col_names <- function(col){
col_split <- stringr::str_split(string = col[1],pattern = ":")
col_split[[1]][1]
}
new_cn <- sapply(df,get_col_names)
> df %>%
+ mutate(
+ across(.cols = everything(),
+ .fns = ~gsub("^.*:","",.x))
+ ) %>%
+ set_names(nm = new_cn)
name cat
1 B X
2 c Y
3 d Z
4 a K
5 k L
6 A K
答案4
得分: 0
这是一个整洁数据解决方案:第二部分也由 @Ronak Shah 提供:
library(dplyr)
library(tidyr)
my_names <- df %>%
filter(if_any(everything(), ~.!="null")) %>%
pivot_longer(everything()) %>%
separate(value, into = c("a", "b")) %>%
pull(a) %>%
df %>%
rename_with(~unique(my_names)) %>%
mutate(across(everything(), ~sub('.*:', '', .)))
name cat
1 null null
2 c Y
3 d Z
4 a K
5 k L
6 A K
英文:
Here is a tidyverse solution: The second part is also provided by @Ronak Shah:
library(dplyr)
library(tidyr)
my_names <- df %>%
filter(if_any(everything(), ~.!="null")) %>%
pivot_longer(everything()) %>%
separate(value, into = c("a", "b")) %>%
pull(a) %>%
df %>%
rename_with(~unique(my_names)) %>%
mutate(across(everything(), ~sub('.*:', '', .)))
name cat
1 null null
2 c Y
3 d Z
4 a K
5 k L
6 A K
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论