将数据框从宽格式转换为长格式,具有多个数值。

huangapple go评论66阅读模式
英文:

Transpose dataframe from wide-to-long with multiple values

问题

我有一个包含多个值的宽数据,我需要将其转置为长格式的数据表格。我查看了几个网站和StackOverflow,似乎找不到答案,尽管这个问题很简单。

一些示例数据:

```R
#示例数据
wide <- data.frame(
  VariableName = c("Var1","Var2","Var3","Var4","Var5"),
 Year1 = c(411,723,325,456,579),
 Year2 = c(123,300,400,500,600),
 Year3 = c(1457,1000,569,896,956)
)

它看起来像这样:

  VariableName Year1 Year2 Year3
1         Var1   411   123  1457
2         Var2   723   300  1000
3         Var3   325   400   569
4         Var4   456   500   896
5         Var5   579   600   956

我需要它看起来像这样:

VariableName  Var1  Var2  Var3  Var4  Var5
Year1         411   723   325   456   579
Year2         123   300   400   500   600
Year3         1457  1000  569   896   956

我尝试过几个函数,包括gather()pivot_longer()melt(),但似乎无法让函数正常工作,或者给我所需的结果。


<details>
<summary>英文:</summary>

I have wide data with multiple values that I need to transpose into a long data table. I&#39;ve looked at several sites and StackOverflow and cannot seem to find the answer to this, even though its so simple.

Some example data:

#example data
wide <- data.frame(
VariableName = c("Var1","Var2","Var3","Var4","Var5"),
Year1 = c(411,723,325,456,579),
Year2 = c(123,300,400,500,600),
Year3 = c(1457,1000,569,896,956)
)


which looks like this

VariableName Year1 Year2 Year3
1 Var1 411 123 1457
2 Var2 723 300 1000
3 Var3 325 400 569
4 Var4 456 500 896
5 Var5 579 600 956


And I need it to look like this

VariableName Var1 Var2 Var3 Var4 Var5
Year1 411 723 325 456 579
Year2 123 300 400 500 600
Year3 1457 1000 569 896 956


I&#39;ve tried several functions, including gather(), pivot_longer(), melt() but I just can&#39;t seem to get the function to either work, or give me the results I need.

</details>


# 答案1
**得分**: 2

```r
**选项1:** 使用 `tidyr` 包中的 `pivot_longer` 和 `pivot_wider` 函数:

```r
library(tidyr)

wide %>% 
  pivot_longer(-VariableName, names_to = 'Year') %>% 
  pivot_wider(id_cols = Year, names_from = VariableName)

选项2: 你也可以使用 tibble 包提供的一些实用工具,在转置数据之前和之后操作行名称。

library(tibble)

wide %>% 
  column_to_rownames("VariableName") %>%
  t() %>% as.data.frame() %>%
  rownames_to_column("Year")
输出
# # A tibble: 3 × 6
#   Year   Var1  Var2  Var3  Var4  Var5
#   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Year1   411   723   325   456   579
# 2 Year2   123   300   400   500   600
# 3 Year3  1457  1000   569   896   956
英文:

Option 1: With tidyr, pivot_longer and then pivot_wider:

library(tidyr)

wide %&gt;%
  pivot_longer(-VariableName, names_to = &#39;Year&#39;) %&gt;%
  pivot_wider(id_cols = Year, names_from = VariableName)

Option 2: You can also use some useful tools provided by tibble to operate row names before and after transposing the data.

library(tibble)

wide %&gt;%
  column_to_rownames(&quot;VariableName&quot;) %&gt;%
  t() %&gt;% as.data.frame() %&gt;%
  rownames_to_column(&quot;Year&quot;)
Output
# # A tibble: 3 &#215; 6
#   Year   Var1  Var2  Var3  Var4  Var5
#   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
# 1 Year1   411   723   325   456   579
# 2 Year2   123   300   400   500   600
# 3 Year3  1457  1000   569   896   956

答案2

得分: 1

由于您只需要数据帧的转置版本,您可以使用t()函数来简单地对其进行转置。直接将t()应用于您的数据帧不起作用,因为您已将行名称存储在一个列中,这意味着数据强制转换将将所有数据转换为最低公共分母,即字符串:

> t(wide)
             [,1]   [,2]   [,3]   [,4]   [,5]  
VariableName "Var1" "Var2" "Var3" "Var4" "Var5"
Year1        "411"  "723"  "325"  "456"  "579" 
Year2        "123"  "300"  "400"  "500"  "600" 
Year3        "1457" "1000" " 569" " 896" " 956"

解决方案:将变量名称存储在行名称中,而不是在列中。

> row.names(wide) <- wide$VariableName
> wide <- subset(wide, select=-c(VariableName))
> t(wide)
       Var1 Var2 Var3 Var4 Var5
Year1  411  723  325  456  579
Year2  123  300  400  500  600
Year3 1457 1000  569  896  956

正如 @darren-tsai 指出的,t() 的结果是一个矩阵,所以如果您需要一个数据帧,您必须使用as.data.frame进行转换。

英文:

As you are only looking for a transposed version of your data frame, you can simply transpose it with t(). Directly applying t() to your data frame does not work, though, because you have stored the row names in a column, which means that data coercion will cast all your data to the least common denominator, i.e. to strings:

&gt; t(wide)
             [,1]   [,2]   [,3]   [,4]   [,5]  
VariableName &quot;Var1&quot; &quot;Var2&quot; &quot;Var3&quot; &quot;Var4&quot; &quot;Var5&quot;
Year1        &quot;411&quot;  &quot;723&quot;  &quot;325&quot;  &quot;456&quot;  &quot;579&quot; 
Year2        &quot;123&quot;  &quot;300&quot;  &quot;400&quot;  &quot;500&quot;  &quot;600&quot; 
Year3        &quot;1457&quot; &quot;1000&quot; &quot; 569&quot; &quot; 896&quot; &quot; 956&quot;

Solution: Store the variable names in the row names, not in a column.

&gt; row.names(wide) &lt;- wide$VariableName
&gt; wide &lt;- subset(wide, select=-c(VariableName))
&gt; t(wide)
       Var1 Var2 Var3 Var4 Var5
Year1  411  723  325  456  579
Year2  123  300  400  500  600
Year3 1457 1000  569  896  956

As @darren-tsai pointed out, the result of t() is a matrix, so if you need a data frame, you must convert it with as.data.frame.

huangapple
  • 本文由 发表于 2023年7月20日 16:25:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76727979.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定