将数据框从宽格式转换为长格式,具有多个数值。

huangapple go评论96阅读模式
英文:

Transpose dataframe from wide-to-long with multiple values

问题

  1. 我有一个包含多个值的宽数据,我需要将其转置为长格式的数据表格。我查看了几个网站和StackOverflow,似乎找不到答案,尽管这个问题很简单。
  2. 一些示例数据:
  3. ```R
  4. #示例数据
  5. wide <- data.frame(
  6. VariableName = c("Var1","Var2","Var3","Var4","Var5"),
  7. Year1 = c(411,723,325,456,579),
  8. Year2 = c(123,300,400,500,600),
  9. Year3 = c(1457,1000,569,896,956)
  10. )

它看起来像这样:

  1. VariableName Year1 Year2 Year3
  2. 1 Var1 411 123 1457
  3. 2 Var2 723 300 1000
  4. 3 Var3 325 400 569
  5. 4 Var4 456 500 896
  6. 5 Var5 579 600 956

我需要它看起来像这样:

  1. VariableName Var1 Var2 Var3 Var4 Var5
  2. Year1 411 723 325 456 579
  3. Year2 123 300 400 500 600
  4. Year3 1457 1000 569 896 956

我尝试过几个函数,包括gather()pivot_longer()melt(),但似乎无法让函数正常工作,或者给我所需的结果。

  1. <details>
  2. <summary>英文:</summary>
  3. I have wide data with multiple values that I need to transpose into a long data table. I&#39;ve looked at several sites and StackOverflow and cannot seem to find the answer to this, even though its so simple.
  4. Some example data:

#example data
wide <- data.frame(
VariableName = c("Var1","Var2","Var3","Var4","Var5"),
Year1 = c(411,723,325,456,579),
Year2 = c(123,300,400,500,600),
Year3 = c(1457,1000,569,896,956)
)

  1. which looks like this

VariableName Year1 Year2 Year3
1 Var1 411 123 1457
2 Var2 723 300 1000
3 Var3 325 400 569
4 Var4 456 500 896
5 Var5 579 600 956

  1. And I need it to look like this

VariableName Var1 Var2 Var3 Var4 Var5
Year1 411 723 325 456 579
Year2 123 300 400 500 600
Year3 1457 1000 569 896 956

  1. I&#39;ve tried several functions, including gather(), pivot_longer(), melt() but I just can&#39;t seem to get the function to either work, or give me the results I need.
  2. </details>
  3. # 答案1
  4. **得分**: 2
  5. ```r
  6. **选项1:** 使用 `tidyr` 包中的 `pivot_longer` 和 `pivot_wider` 函数:
  7. ```r
  8. library(tidyr)
  9. wide %>%
  10. pivot_longer(-VariableName, names_to = 'Year') %>%
  11. pivot_wider(id_cols = Year, names_from = VariableName)

选项2: 你也可以使用 tibble 包提供的一些实用工具,在转置数据之前和之后操作行名称。

  1. library(tibble)
  2. wide %>%
  3. column_to_rownames("VariableName") %>%
  4. t() %>% as.data.frame() %>%
  5. rownames_to_column("Year")
输出
  1. # # A tibble: 3 × 6
  2. # Year Var1 Var2 Var3 Var4 Var5
  3. # <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
  4. # 1 Year1 411 723 325 456 579
  5. # 2 Year2 123 300 400 500 600
  6. # 3 Year3 1457 1000 569 896 956
英文:

Option 1: With tidyr, pivot_longer and then pivot_wider:

  1. library(tidyr)
  2. wide %&gt;%
  3. pivot_longer(-VariableName, names_to = &#39;Year&#39;) %&gt;%
  4. pivot_wider(id_cols = Year, names_from = VariableName)

Option 2: You can also use some useful tools provided by tibble to operate row names before and after transposing the data.

  1. library(tibble)
  2. wide %&gt;%
  3. column_to_rownames(&quot;VariableName&quot;) %&gt;%
  4. t() %&gt;% as.data.frame() %&gt;%
  5. rownames_to_column(&quot;Year&quot;)
Output
  1. # # A tibble: 3 &#215; 6
  2. # Year Var1 Var2 Var3 Var4 Var5
  3. # &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  4. # 1 Year1 411 723 325 456 579
  5. # 2 Year2 123 300 400 500 600
  6. # 3 Year3 1457 1000 569 896 956

答案2

得分: 1

由于您只需要数据帧的转置版本,您可以使用t()函数来简单地对其进行转置。直接将t()应用于您的数据帧不起作用,因为您已将行名称存储在一个列中,这意味着数据强制转换将将所有数据转换为最低公共分母,即字符串:

  1. > t(wide)
  2. [,1] [,2] [,3] [,4] [,5]
  3. VariableName "Var1" "Var2" "Var3" "Var4" "Var5"
  4. Year1 "411" "723" "325" "456" "579"
  5. Year2 "123" "300" "400" "500" "600"
  6. Year3 "1457" "1000" " 569" " 896" " 956"

解决方案:将变量名称存储在行名称中,而不是在列中。

  1. > row.names(wide) <- wide$VariableName
  2. > wide <- subset(wide, select=-c(VariableName))
  3. > t(wide)
  4. Var1 Var2 Var3 Var4 Var5
  5. Year1 411 723 325 456 579
  6. Year2 123 300 400 500 600
  7. Year3 1457 1000 569 896 956

正如 @darren-tsai 指出的,t() 的结果是一个矩阵,所以如果您需要一个数据帧,您必须使用as.data.frame进行转换。

英文:

As you are only looking for a transposed version of your data frame, you can simply transpose it with t(). Directly applying t() to your data frame does not work, though, because you have stored the row names in a column, which means that data coercion will cast all your data to the least common denominator, i.e. to strings:

  1. &gt; t(wide)
  2. [,1] [,2] [,3] [,4] [,5]
  3. VariableName &quot;Var1&quot; &quot;Var2&quot; &quot;Var3&quot; &quot;Var4&quot; &quot;Var5&quot;
  4. Year1 &quot;411&quot; &quot;723&quot; &quot;325&quot; &quot;456&quot; &quot;579&quot;
  5. Year2 &quot;123&quot; &quot;300&quot; &quot;400&quot; &quot;500&quot; &quot;600&quot;
  6. Year3 &quot;1457&quot; &quot;1000&quot; &quot; 569&quot; &quot; 896&quot; &quot; 956&quot;

Solution: Store the variable names in the row names, not in a column.

  1. &gt; row.names(wide) &lt;- wide$VariableName
  2. &gt; wide &lt;- subset(wide, select=-c(VariableName))
  3. &gt; t(wide)
  4. Var1 Var2 Var3 Var4 Var5
  5. Year1 411 723 325 456 579
  6. Year2 123 300 400 500 600
  7. Year3 1457 1000 569 896 956

As @darren-tsai pointed out, the result of t() is a matrix, so if you need a data frame, you must convert it with as.data.frame.

huangapple
  • 本文由 发表于 2023年7月20日 16:25:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76727979.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定