使用R中的Pivot_wider函数时出现错误?

huangapple go评论221阅读模式
英文:

Error when using Pivot_wider function in R?

问题

我正在使用pivot_wider函数处理以下示例数据,但似乎我漏掉了一些东西,因为输出产生了NA,并没有正确将数据放入其相应的列中。有什么建议吗?

  1. library(tidyverse)
  2. set.seed(123)
  3. DF <- data.frame(Date = seq(as.Date("1991-01-01"),
  4. to = as.Date("2000-12-31"),
  5. by = "year"),
  6. Parameter = rep(c("A", "B"), times = 10),
  7. Value = runif(10, 1, 50)) %>%
  8. pivot_wider(names_from = Parameter, values_from = Value)

输出

  1. > print(DF)
  2. # A tibble: 10 × 3
  3. Date A B
  4. <date> <list> <list>
  5. 1 1991-01-01 <dbl [2]> <NULL>
  6. 2 1992-01-01 <NULL> <dbl [2]>
  7. 3 1993-01-01 <dbl [2]> <NULL>
  8. 4 1994-01-01 <NULL> <dbl [2]>
  9. 5 1995-01-01 <dbl [2]> <NULL>
  10. 6 1996-01-01 <NULL> <dbl [2]>
  11. 7 1997-01-01 <dbl [2]> <NULL>
  12. 8 1998-01-01 <NULL> <dbl [2]>
  13. 9 1999-01-01 <dbl [2]> <NULL>
  14. 10 2000-01-01 <NULL> <dbl [2]>

请注意,代码部分没有进行翻译。

英文:

I am using pivot_wider function on the following sample data but look like I am missing something here as the output are producing NA&#39;s with not rightly placing data into its respective column. Any suggestion, please?

  1. library(tidyverse)
  2. set.seed(123)
  3. DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;),
  4. to = as.Date(&quot;2000-12-31&quot;),
  5. by = &quot;year&quot;),
  6. Parameter = rep(c(&quot;A&quot;,&quot;B&quot;), times = 10),
  7. Value = runif(10,1,50)) %&gt;%
  8. pivot_wider(names_from = Parameter, values_from = Value)

Output

  1. &gt; print(DF)
  2. # A tibble: 10 &#215; 3
  3. Date A B
  4. &lt;date&gt; &lt;list&gt; &lt;list&gt;
  5. 1 1991-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  6. 2 1992-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  7. 3 1993-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  8. 4 1994-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  9. 5 1995-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  10. 6 1996-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  11. 7 1997-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  12. 8 1998-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  13. 9 1999-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  14. 10 2000-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;

答案1

得分: 2

我认为你在生成样本数据的方式上出现了错误。如果这不是你要找的解决方案,我为此道歉,但以下是我认为你可以获得所需输出的方法:

编辑: @s_pike在我之前大约一分钟提供了相同的解决方案,并解释了我们都做了什么更改。抱歉没有提及并感谢他们,现在包括在这里以确保清晰。

问题出在你对rep()函数的调用上。你当前正在重复A和B各十次,总共有20个值在Parameter列中。因为你提供了一个包含10个日期的向量,它将被循环使用以匹配Parameter的长度。由于A和B是依次复制的,每个日期在Date循环时都具有相同的Parameter值,这是重复的原因。

如果你改为使用以下调用:Parameter = c(rep("A", 10), rep("B", 10)),那么Date中的每个值都会获得A和B的Parameter,因此不会出现重复。请参见以下代码:

  1. > set.seed(123)
  2. > DF <- data.frame(Date = seq(as.Date("1991-01-01"),
  3. to = as.Date("2000-12-31"),
  4. by = "year"),
  5. Parameter = c(rep("A",10),rep("B",10)),
  6. Value = runif(10,1,50))
  7. > DF
  8. Date Parameter Value
  9. 1 1991-01-01 A 15.091298
  10. 2 1992-01-01 A 39.626952
  11. 3 1993-01-01 A 21.039869
  12. 4 1994-01-01 A 44.267853
  13. 5 1995-01-01 A 47.082897
  14. 6 1996-01-01 A 3.232268
  15. 7 1997-01-01 A 26.877169
  16. 8 1998-01-01 A 44.728533
  17. 9 1999-01-01 A 28.020316
  18. 10 2000-01-01 A 23.374122
  19. 11 1991-01-01 B 15.091298
  20. 12 1992-01-01 B 39.626952
  21. 13 1993-01-01 B 21.039869
  22. 14 1994-01-01 B 44.267853
  23. 15 1995-01-01 B 47.082897
  24. 16 1996-01-01 B 3.232268
  25. 17 1997-01-01 B 26.877169
  26. 18 1998-01-01 B 44.728533
  27. 19 1999-01-01 B 28.020316
  28. 20 2000-01-01 B 23.374122

这应该符合你的要求,现在你的pivot_wider应该正常工作:

  1. > DF %>%
  2. + pivot_wider(names_from = Parameter, values_from = Value)
  3. # A tibble: 10 × 3
  4. Date A B
  5. <date> <dbl> <dbl>
  6. 1 1991-01-01 15.1 15.1
  7. 2 1992-01-01 39.6 39.6
  8. 3 1993-01-01 21.0 21.0
  9. 4 1994-01-01 44.3 44.3
  10. 5 1995-01-01 47.1 47.1
  11. 6 1996-01-01 3.23 3.23
  12. 7 1997-01-01 26.9 26.9
  13. 8 1998-01-01 44.7 44.7
  14. 9 1999-01-01 28.0 28.0
  15. 10 2000-01-01 23.4 23.4
英文:

I think you've got an error in how you're generating your sample data. Apologies if this isn't what you're looking for, but here's how I think you can get your desired output:

EDIT: @s_pike provided the same solution below, about a minute before me, with an explanation of what we both changed. Sorry for omission and thanks to them for that- included now for clarity.

The problem is in your call to rep(). You're currently repeating A and B ten times, for a total of 20 values in the Parameter column. Because you're providing a vector of 10 dates, it will be recycled to match the length of Parameter. Because A and B are replicated one after the other, each date has the same Parameter value when Date is recycled: the cause of the duplication.

If instead you change the call to: Parameter = c(rep(&quot;A&quot;,10),rep(&quot;B&quot;,10)) each value in Date gets a Parameter of both A and B, so there are no duplications. See below:

  1. &gt; set.seed(123)
  2. &gt; DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;),
  3. to = as.Date(&quot;2000-12-31&quot;),
  4. by = &quot;year&quot;),
  5. Parameter = c(rep(&quot;A&quot;,10),rep(&quot;B&quot;,10)),
  6. Value = runif(10,1,50))
  7. &gt; DF
  8. Date Parameter Value
  9. 1 1991-01-01 A 15.091298
  10. 2 1992-01-01 A 39.626952
  11. 3 1993-01-01 A 21.039869
  12. 4 1994-01-01 A 44.267853
  13. 5 1995-01-01 A 47.082897
  14. 6 1996-01-01 A 3.232268
  15. 7 1997-01-01 A 26.877169
  16. 8 1998-01-01 A 44.728533
  17. 9 1999-01-01 A 28.020316
  18. 10 2000-01-01 A 23.374122
  19. 11 1991-01-01 B 15.091298
  20. 12 1992-01-01 B 39.626952
  21. 13 1993-01-01 B 21.039869
  22. 14 1994-01-01 B 44.267853
  23. 15 1995-01-01 B 47.082897
  24. 16 1996-01-01 B 3.232268
  25. 17 1997-01-01 B 26.877169
  26. 18 1998-01-01 B 44.728533
  27. 19 1999-01-01 B 28.020316
  28. 20 2000-01-01 B 23.374122

This should do what you want and your pivot_wider should work now:

  1. &gt; DF %&gt;%
  2. + pivot_wider(names_from = Parameter, values_from = Value)
  3. # A tibble: 10 &#215; 3
  4. Date A B
  5. &lt;date&gt; &lt;dbl&gt; &lt;dbl&gt;
  6. 1 1991-01-01 15.1 15.1
  7. 2 1992-01-01 39.6 39.6
  8. 3 1993-01-01 21.0 21.0
  9. 4 1994-01-01 44.3 44.3
  10. 5 1995-01-01 47.1 47.1
  11. 6 1996-01-01 3.23 3.23
  12. 7 1997-01-01 26.9 26.9
  13. 8 1998-01-01 44.7 44.7
  14. 9 1999-01-01 28.0 28.0
  15. 10 2000-01-01 23.4 23.4

答案2

得分: 1

尝试更改参数中的重复次数为 c(rep("A", times = 10), rep("B", times=10)),假设您的意图是每年有一个 "A" 和一个 "B"

与您的原始代码进行比较:

  1. library(tidyverse)
  2. set.seed(123)
  3. DF <- data.frame(Date = seq(as.Date("1991-01-01"),
  4. to = as.Date("2000-12-31"),
  5. by = "year"),
  6. Parameter = rep(c("A","B"), times = 10),
  7. Value = runif(10,1,50)) %>%
  8. pivot_wider(names_from = Parameter, values_from = Value)
  9. DF

使用以下代码:

  1. DF <- data.frame(Date = seq(as.Date("1991-01-01"),
  2. to = as.Date("2000-12-31"),
  3. by = "year"),
  4. Parameter = c(rep("A", times = 10), rep("B", times=10)),
  5. Value = runif(10,1,50)) %>%
  6. pivot_wider(names_from = Parameter, values_from = Value)
  7. DF

<sup>创建于2023年08月10日,使用 reprex v2.0.2</sup>

英文:

Try changing the repeat in the Parameter to c(rep(&quot;A&quot;, times = 10), rep(&quot;B&quot;, times=10)), assuming your intention is to have one &quot;A&quot; per year, and one &quot;B&quot; per year.

Compare your original:

  1. library(tidyverse)
  2. set.seed(123)
  3. DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;),
  4. to = as.Date(&quot;2000-12-31&quot;),
  5. by = &quot;year&quot;),
  6. Parameter = rep(c(&quot;A&quot;,&quot;B&quot;), times = 10),
  7. Value = runif(10,1,50)) %&gt;%
  8. pivot_wider(names_from = Parameter, values_from = Value)
  9. DF
  10. #&gt; # A tibble: 10 &#215; 3
  11. #&gt; Date A B
  12. #&gt; &lt;date&gt; &lt;list&gt; &lt;list&gt;
  13. #&gt; 1 1991-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  14. #&gt; 2 1992-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  15. #&gt; 3 1993-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  16. #&gt; 4 1994-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  17. #&gt; 5 1995-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  18. #&gt; 6 1996-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  19. #&gt; 7 1997-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  20. #&gt; 8 1998-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;
  21. #&gt; 9 1999-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;
  22. #&gt; 10 2000-01-01 &lt;NULL&gt; &lt;dbl [2]&gt;

With:

  1. DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;),
  2. to = as.Date(&quot;2000-12-31&quot;),
  3. by = &quot;year&quot;),
  4. Parameter = c(rep(&quot;A&quot;, times = 10), rep(&quot;B&quot;, times=10)),
  5. Value = runif(10,1,50)) %&gt;%
  6. pivot_wider(names_from = Parameter, values_from = Value)
  7. DF
  8. #&gt; # A tibble: 10 &#215; 3
  9. #&gt; Date A B
  10. #&gt; &lt;date&gt; &lt;dbl&gt; &lt;dbl&gt;
  11. #&gt; 1 1991-01-01 47.9 47.9
  12. #&gt; 2 1992-01-01 23.2 23.2
  13. #&gt; 3 1993-01-01 34.2 34.2
  14. #&gt; 4 1994-01-01 29.1 29.1
  15. #&gt; 5 1995-01-01 6.04 6.04
  16. #&gt; 6 1996-01-01 45.1 45.1
  17. #&gt; 7 1997-01-01 13.1 13.1
  18. #&gt; 8 1998-01-01 3.06 3.06
  19. #&gt; 9 1999-01-01 17.1 17.1
  20. #&gt; 10 2000-01-01 47.8 47.8

<sup>Created on 2023-08-10 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年8月10日 23:40:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76877305.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定