使用字符串为tibble分配多个列类型。

huangapple go评论58阅读模式
英文:

Use a string to assign multiple column types to tibble

问题

以下是已更改所有列类型的代码部分:

d <- structure(list(a = as.factor(c(9, 9, 9, 9, 9, 9, 9)),
                    b = as.factor(c(2018, 2018, 2018, 2018, 2018, 2018, 2018)),
                    c = as.factor(c("605417CA", "605417CB", "606822AS", "606822AT", "606822AU", "606822AV", "60683MAB")),
                    d = as.integer(c(NA, NA, NA, NA, NA, NA, NA)),
                    e = as.integer(c(0, 0, 0, 0, 0, 0, 0)),
                    f = as.factor(c(2772, 2772, 46367, 46367, 46367, 46367, 47601))),
               row.names = c(NA, -7L),
               class = c("tbl_df", "tbl", "data.frame"))

如果您想通过字符串来更改列类型,您可以使用以下代码:

# 定义要进行的列类型转换字符串
conversion_string <- "fffiif"

# 创建一个空白数据框,与原始数据框结构相同,但列类型根据字符串进行转换
new_d <- as.data.frame(matrix(NA, nrow = nrow(d), ncol = ncol(d)))
colnames(new_d) <- colnames(d)

for (i in 1:ncol(d)) {
  col_type <- substr(conversion_string, i, i)
  if (col_type == "f") {
    new_d[, i] <- as.factor(d[, i])
  } else if (col_type == "i") {
    new_d[, i] <- as.integer(d[, i])
  } else {
    new_d[, i] <- d[, i]
  }
}

# 将新数据框的类别设置为与原始数据框相同
class(new_d) <- class(d)

这将创建一个新的数据框 new_d,其列类型与您指定的字符串 fffiif 一致。

英文:

The following data has six columns. I want to change all their column types, respectively to factor-factor-factor-int-int-factor.

d &lt;- structure(list(a = c(9, 9, 9, 9, 9, 9, 9), b = structure(c(2018, 2018, 2018, 2018, 2018, 2018, 2018), class = &quot;yearmon&quot;), c = c(&quot;605417CA&quot;, &quot;605417CB&quot;, &quot;606822AS&quot;, &quot;606822AT&quot;, &quot;606822AU&quot;, &quot;606822AV&quot;, &quot;60683MAB&quot;), d = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), e = c(0, 0, 0, 0, 0, 0, 0), f = c(2772, 2772, 46367, 46367, 46367, 46367, 47601)), row.names = c(NA, -7L), class = c(&quot;tbl_df&quot;,  &quot;tbl&quot;, &quot;data.frame&quot;))

If I was reading this data from an external file, I would use vroom(path, col_types = &quot;fffiif&quot;), and it automatically converts each variable in a string. But here the data is the result of previous computation, so I need to do the conversion myself. Is there a way to change all column types with a simple string, like vroom does?

Things I tried:

  • Using mutate for each of six variables is quite long.
  • Conversions are not one-to-one. For example, "a" and "e" are double, but I want to convert them to factor and int respectively. So mutate_if would not work.
  • The magrittr package has set_colnames, to change colum names by passing a vector of strings. There may be something similar to change column types, but I haven't found anything.
  • readr::type_convert seems to only apply to columns of type character.

I saved the data locally and imported it with vroom(path, col_types = &quot;fffiif&quot;), which works perfectly. So the question is to what function I can pass the string fffiif to do the conversion once I already have the data.

答案1

得分: 2

要避免重复,可以使用[链接帖子](https://stackoverflow.com/a/72369872/680068)中的*forloop*或使用*across*进行*mutate*:

```R
library(dplyr)

d %>%
  mutate(across(c(a:c, f), ~ as.factor(.x)),
         across(d:e, ~ as.integer(.x)))

# # A tibble: 7 × 6
#   a     b     c            d     e f    
#   <fct> <fct> <fct>    <int> <int> <fct>
# 1 9     2018  605417CA    NA     0 2772 
# 2 9     2018  605417CB    NA     0 2772 
# 3 9     2018  606822AS    NA     0 46367
# 4 9     2018  606822AT    NA     0 46367
# 5 9     2018  606822AU    NA     0 46367
# 6 9     2018  606822AV    NA     0 46367
# 7 9     2018  60683MAB    NA     0 47601

与链接帖子类似,使用lapply

ff <- list(f = as.factor, i = as.integer)
cc <- unlist(strsplit("fffiif", ""))

d[] <- lapply(seq_along(d), \(i) ff[[cc[i]]](d[[i]]))

sapply(d, class)
#       a         b         c         d         e         f 
# "factor"  "factor"  "factor" "integer" "integer"  "factor"
英文:

Either use the forloop from the linked post or mutate with across to avoid repetition:

library(dplyr)

d %&gt;% 
  mutate(across(c(a:c, f), ~ as.factor(.x)),
         across(d:e, ~ as.integer(.x)))

# # A tibble: 7 &#215; 6
#   a     b     c            d     e f    
#   &lt;fct&gt; &lt;fct&gt; &lt;fct&gt;    &lt;int&gt; &lt;int&gt; &lt;fct&gt;
# 1 9     2018  605417CA    NA     0 2772 
# 2 9     2018  605417CB    NA     0 2772 
# 3 9     2018  606822AS    NA     0 46367
# 4 9     2018  606822AT    NA     0 46367
# 5 9     2018  606822AU    NA     0 46367
# 6 9     2018  606822AV    NA     0 46367
# 7 9     2018  60683MAB    NA     0 47601

Similar to the linked post, using lapply:

ff &lt;- list(f = as.factor, i = as.integer)
cc &lt;- unlist(strsplit(&quot;fffiif&quot;, &quot;&quot;))

d[] &lt;- lapply(seq_along(d), \(i) ff[[ cc[ i ] ]](d[[ i ]]))

sapply(d, class)
#       a         b         c         d         e         f 
# &quot;factor&quot;  &quot;factor&quot;  &quot;factor&quot; &quot;integer&quot; &quot;integer&quot;  &quot;factor&quot; 

huangapple
  • 本文由 发表于 2023年5月24日 17:14:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76321897.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定