如何在R中将具有多个分类变量的宽格式数据转换为长格式?

huangapple go评论51阅读模式
英文:

How to convert a wide format data with multiple categorical variables to long format in R?

问题

以下是翻译好的代码部分:

这是我的初始数据。
```R
df <- data.frame(v1 = c("a", "a", "a", "a", "a", "a"),
                 v5 = c("c1", "c2", NA, "c1", "c2", NA),
                 v6 = c(20, 30, NA, 14, 26, NA),
                 v7 = c("d1", "d2", "d3", "d1", "d2", "d3"),
                 v8 = c(2, 3, 4, 5, 6, 7))

我想将其转换为以下的长格式。

    v1 Variable Var_category var_value
1   a       v5           c1        20
2   a       v5           c2        30
3   a       v7           d1         2
4   a       v7           d2         3
5   a       v7           d3         4
6   a       v5           c1        14
7   a       v5           c2        26
8   a       v6           d1         5
9   a       v6           d2         6
10  a       v6           d3         7

你可以在R中使用pivot_longer()函数来实现这个转换。以下是示例代码:

library(tidyr)

result <- df %>%
  pivot_longer(cols = -v1, names_to = "Variable", values_to = "var_value") %>%
  separate(Variable, into = c("Var_category", "Variable"), sep = "_") %>%
  select(v1, Variable, Var_category, var_value)

这将把你的数据从宽格式转换为长格式。希望这有帮助!

英文:

Here is my initial data.

df &lt;- data.frame(v1 = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;),
                 v5 = c(&quot;c1&quot;, &quot;c2&quot;, NA, &quot;c1&quot;, &quot;c2&quot;, NA),
                 v6 = c(20, 30, NA, 14, 26, NA),
                 v7 = c(&quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;, &quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;),
                 v8 = c(2, 3, 4, 5, 6, 7))

  v1   v5   v6 v7 v8
1  a   c1   20 d1  2
2  a   c2   30 d2  3
3  a &lt;NA&gt; &lt;NA&gt; d3  4
4  a   c1   14 d1  5
5  a   c2   26 d2  6
6  a &lt;NA&gt; &lt;NA&gt; d3  7

I'd like to convert it to one with long format like below.

    v1 Variable Var_category var_value
1   a       v5           c1        20
2   a       v5           c2        30
3   a       v7           d1         2
4   a       v7           d2         3
5   a       v7           d3         4
6   a       v5           c1        14
7   a       v5           c2        26
8   a       v6           d1         5
9   a       v6           d2         6
10  a       v6           d3         7

How can I do this in R?

I tried using pivot_longer() and gather() but could not find the correct way to do this since it's not either traditional long format or traditional wide format I guess.

答案1

得分: 2

这是使用 pivot_longer() 的示例,但最容易的方法是重命名变量,使它们的词干与类别和值对应:

library(dplyr)
library(tidyr)
df <- data.frame(v1 = c("a", "a", "a", "a", "a", "a"),
                 v5 = c("c1", "c2", NA, "c1", "c2", NA),
                 v6 = c(20, 30, NA, 14, 26, NA),
                 v7 = c("d1", "d2", "d3", "d1", "d2", "d3"),
                 v8 = c(2, 3, 4, 5, 6, 7))

df %>% 
  rename(cat_v5 = v5, cat_v7 = v7, vals_v5 = v6, vals_v7 = v8) %>%
  pivot_longer(-v1, 
               names_pattern=c("(.*)_(.*)"), 
               names_to = c(".value", "category")) %>% 
  na.omit()

<sup>创建于2023年3月30日,使用 reprex v2.0.2 </sup>

英文:

Here's an example using pivot_longer(), but you it's easiest if you rename the variables so the stems correspond with categories and values:

library(dplyr)
library(tidyr)
df &lt;- data.frame(v1 = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;),
                 v5 = c(&quot;c1&quot;, &quot;c2&quot;, NA, &quot;c1&quot;, &quot;c2&quot;, NA),
                 v6 = c(20, 30, NA, 14, 26, NA),
                 v7 = c(&quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;, &quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;),
                 v8 = c(2, 3, 4, 5, 6, 7))


df %&gt;% 
  rename(cat_v5 = v5, cat_v7 = v7, vals_v5 = v6, vals_v7 = v8) %&gt;%
  pivot_longer(-v1, 
               names_pattern=c(&quot;(.*)_(.*)&quot;), 
               names_to = c(&quot;.value&quot;, &quot;category&quot;)) %&gt;% 
  na.omit()
#&gt; # A tibble: 10 &#215; 4
#&gt;    v1    category cat    vals
#&gt;    &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt; &lt;dbl&gt;
#&gt;  1 a     v5       c1       20
#&gt;  2 a     v7       d1        2
#&gt;  3 a     v5       c2       30
#&gt;  4 a     v7       d2        3
#&gt;  5 a     v7       d3        4
#&gt;  6 a     v5       c1       14
#&gt;  7 a     v7       d1        5
#&gt;  8 a     v5       c2       26
#&gt;  9 a     v7       d2        6
#&gt; 10 a     v7       d3        7

<sup>Created on 2023-03-30 with reprex v2.0.2</sup>

答案2

得分: 2

Base R的reshape函数可以灵活处理这种稍微独特的情况,只要你创建一个列表,显示哪些变量对应哪些值:

vars <- list(Var_category=c("v5","v7"), Var_value=c("v6","v8"))
out <- na.omit(reshape(df, varying=vars, v.names=names(vars), direction="long",
                       times=vars[[1]], timevar="Variable"))
out

如果你想要还原回包括所有命名的宽格式,只需使用下面的魔法命令:

reshape(out)
英文:

Base R reshape deals with this slightly unique circumstance pretty flexibly, if you make yourself a list showing what variables go to what values:

vars &lt;- list(Var_category=c(&quot;v5&quot;,&quot;v7&quot;), Var_value=c(&quot;v6&quot;,&quot;v8&quot;))
out &lt;- na.omit(reshape(df, varying=vars, v.names=names(vars), direction=&quot;long&quot;,
                       times=vars[[1]], timevar=&quot;Variable&quot;))
out

##     v1 Variable Var_category Var_value id
##1.v5  a       v5           c1        20  1
##2.v5  a       v5           c2        30  2
##4.v5  a       v5           c1        14  4
##5.v5  a       v5           c2        26  5
##1.v7  a       v7           d1         2  1
##2.v7  a       v7           d2         3  2
##3.v7  a       v7           d3         4  3
##4.v7  a       v7           d1         5  4
##5.v7  a       v7           d2         6  5
##6.v7  a       v7           d3         7  6

Bonus is if you want to revert back to wide including all the naming, just use the magic:

reshape(out)
##     v1 id   v5 v6 v7 v8
##1.v5  a  1   c1 20 d1  2
##2.v5  a  2   c2 30 d2  3
##4.v5  a  4   c1 14 d1  5
##5.v5  a  5   c2 26 d2  6
##3.v7  a  3 &lt;NA&gt; NA d3  4
##6.v7  a  6 &lt;NA&gt; NA d3  7

答案3

得分: 1

以下是代码中的翻译部分:

df &lt;- data.frame(v1 = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;),
                 v5 = c(&quot;c1&quot;, &quot;c2&quot;, NA, &quot;c1&quot;, &quot;c2&quot;, NA),
                 v6 = c(20, 30, NA, 14, 26, NA),
                 v7 = c(&quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;, &quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;),
                 v8 = c(2, 3, 4, 5, 6, 7))
na.omit(
  rbind(
    with(df, data.frame(v1 = v1, Variable = &quot;v5&quot;, Var_category=v5, var_value=v6)), 
    with(df, data.frame(v1 = v1, Variable = &quot;v7&quot;, Var_category=v7, var_value=v8))) )
#&gt;    v1 Variable Var_category var_value
#&gt; 1   a       v5           c1        20
#&gt; 2   a       v5           c2        30
#&gt; 4   a       v5           c1        14
#&gt; 5   a       v5           c2        26
#&gt; 7   a       v7           d1         2
#&gt; 8   a       v7           d2         3
#&gt; 9   a       v7           d3         4
#&gt; 10  a       v7           d1         5
#&gt; 11  a       v7           d2         6
#&gt; 12  a       v7           d3         7

希望这对你有所帮助。如果你有任何其他问题,请随时提问。

英文:

In this case you can simply rbind columns of the data frame

df &lt;- data.frame(v1 = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;),
                 v5 = c(&quot;c1&quot;, &quot;c2&quot;, NA, &quot;c1&quot;, &quot;c2&quot;, NA),
                 v6 = c(20, 30, NA, 14, 26, NA),
                 v7 = c(&quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;, &quot;d1&quot;, &quot;d2&quot;, &quot;d3&quot;),
                 v8 = c(2, 3, 4, 5, 6, 7))
na.omit(
  rbind(
    with(df, data.frame(v1 = v1, Variable = &quot;v5&quot;, Var_category=v5, var_value=v6)), 
    with(df, data.frame(v1 = v1, Variable = &quot;v7&quot;, Var_category=v7, var_value=v8))) )
#&gt;    v1 Variable Var_category var_value
#&gt; 1   a       v5           c1        20
#&gt; 2   a       v5           c2        30
#&gt; 4   a       v5           c1        14
#&gt; 5   a       v5           c2        26
#&gt; 7   a       v7           d1         2
#&gt; 8   a       v7           d2         3
#&gt; 9   a       v7           d3         4
#&gt; 10  a       v7           d1         5
#&gt; 11  a       v7           d2         6
#&gt; 12  a       v7           d3         7

huangapple
  • 本文由 发表于 2023年3月31日 03:50:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75892411.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定