使用R中的Pivot_wider函数时出现错误?

huangapple go评论109阅读模式
英文:

Error when using Pivot_wider function in R?

问题

我正在使用pivot_wider函数处理以下示例数据,但似乎我漏掉了一些东西,因为输出产生了NA,并没有正确将数据放入其相应的列中。有什么建议吗?

library(tidyverse)
set.seed(123)
DF <- data.frame(Date = seq(as.Date("1991-01-01"), 
                            to = as.Date("2000-12-31"), 
                            by = "year"), 
                 Parameter = rep(c("A", "B"), times = 10),
                 Value = runif(10, 1, 50)) %>%
  pivot_wider(names_from = Parameter, values_from = Value)

输出

> print(DF)
# A tibble: 10 × 3
   Date       A         B        
   <date>     <list>    <list>   
 1 1991-01-01 <dbl [2]> <NULL>   
 2 1992-01-01 <NULL>    <dbl [2]>
 3 1993-01-01 <dbl [2]> <NULL>   
 4 1994-01-01 <NULL>    <dbl [2]>
 5 1995-01-01 <dbl [2]> <NULL>   
 6 1996-01-01 <NULL>    <dbl [2]>
 7 1997-01-01 <dbl [2]> <NULL>   
 8 1998-01-01 <NULL>    <dbl [2]>
 9 1999-01-01 <dbl [2]> <NULL>   
10 2000-01-01 <NULL>    <dbl [2]>

请注意,代码部分没有进行翻译。

英文:

I am using pivot_wider function on the following sample data but look like I am missing something here as the output are producing NA&#39;s with not rightly placing data into its respective column. Any suggestion, please?

library(tidyverse)
set.seed(123)
DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;), 
                        to = as.Date(&quot;2000-12-31&quot;), 
                        by = &quot;year&quot;), 
                        Parameter = rep(c(&quot;A&quot;,&quot;B&quot;), times = 10),
                        Value = runif(10,1,50)) %&gt;% 
                        pivot_wider(names_from = Parameter, values_from = Value)

Output

&gt; print(DF)
# A tibble: 10 &#215; 3
   Date       A         B        
   &lt;date&gt;     &lt;list&gt;    &lt;list&gt;   
 1 1991-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
 2 1992-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
 3 1993-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
 4 1994-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
 5 1995-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
 6 1996-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
 7 1997-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
 8 1998-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
 9 1999-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
10 2000-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;

答案1

得分: 2

我认为你在生成样本数据的方式上出现了错误。如果这不是你要找的解决方案,我为此道歉,但以下是我认为你可以获得所需输出的方法:

编辑: @s_pike在我之前大约一分钟提供了相同的解决方案,并解释了我们都做了什么更改。抱歉没有提及并感谢他们,现在包括在这里以确保清晰。

问题出在你对rep()函数的调用上。你当前正在重复A和B各十次,总共有20个值在Parameter列中。因为你提供了一个包含10个日期的向量,它将被循环使用以匹配Parameter的长度。由于A和B是依次复制的,每个日期在Date循环时都具有相同的Parameter值,这是重复的原因。

如果你改为使用以下调用:Parameter = c(rep("A", 10), rep("B", 10)),那么Date中的每个值都会获得A和B的Parameter,因此不会出现重复。请参见以下代码:

> set.seed(123)
> DF <- data.frame(Date = seq(as.Date("1991-01-01"),
                            to = as.Date("2000-12-31"), 
                            by = "year"), 
                 Parameter = c(rep("A",10),rep("B",10)),
                 Value = runif(10,1,50))

> DF
         Date Parameter     Value
1  1991-01-01         A 15.091298
2  1992-01-01         A 39.626952
3  1993-01-01         A 21.039869
4  1994-01-01         A 44.267853
5  1995-01-01         A 47.082897
6  1996-01-01         A  3.232268
7  1997-01-01         A 26.877169
8  1998-01-01         A 44.728533
9  1999-01-01         A 28.020316
10 2000-01-01         A 23.374122
11 1991-01-01         B 15.091298
12 1992-01-01         B 39.626952
13 1993-01-01         B 21.039869
14 1994-01-01         B 44.267853
15 1995-01-01         B 47.082897
16 1996-01-01         B  3.232268
17 1997-01-01         B 26.877169
18 1998-01-01         B 44.728533
19 1999-01-01         B 28.020316
20 2000-01-01         B 23.374122

这应该符合你的要求,现在你的pivot_wider应该正常工作:

> DF %>%
+     pivot_wider(names_from = Parameter, values_from = Value)
# A tibble: 10 × 3
       Date     A     B
     <date> <dbl> <dbl>
1  1991-01-01  15.1  15.1 
2  1992-01-01  39.6  39.6 
3  1993-01-01  21.0  21.0 
4  1994-01-01  44.3  44.3 
5  1995-01-01  47.1  47.1 
6  1996-01-01   3.23   3.23
7  1997-01-01  26.9  26.9 
8  1998-01-01  44.7  44.7 
9  1999-01-01  28.0  28.0 
10 2000-01-01  23.4  23.4 
英文:

I think you've got an error in how you're generating your sample data. Apologies if this isn't what you're looking for, but here's how I think you can get your desired output:

EDIT: @s_pike provided the same solution below, about a minute before me, with an explanation of what we both changed. Sorry for omission and thanks to them for that- included now for clarity.

The problem is in your call to rep(). You're currently repeating A and B ten times, for a total of 20 values in the Parameter column. Because you're providing a vector of 10 dates, it will be recycled to match the length of Parameter. Because A and B are replicated one after the other, each date has the same Parameter value when Date is recycled: the cause of the duplication.

If instead you change the call to: Parameter = c(rep(&quot;A&quot;,10),rep(&quot;B&quot;,10)) each value in Date gets a Parameter of both A and B, so there are no duplications. See below:

&gt; set.seed(123)
&gt; DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;),
                            to = as.Date(&quot;2000-12-31&quot;), 
                            by = &quot;year&quot;), 
                 Parameter = c(rep(&quot;A&quot;,10),rep(&quot;B&quot;,10)),
                 Value = runif(10,1,50))

&gt; DF
         Date Parameter     Value
1  1991-01-01         A 15.091298
2  1992-01-01         A 39.626952
3  1993-01-01         A 21.039869
4  1994-01-01         A 44.267853
5  1995-01-01         A 47.082897
6  1996-01-01         A  3.232268
7  1997-01-01         A 26.877169
8  1998-01-01         A 44.728533
9  1999-01-01         A 28.020316
10 2000-01-01         A 23.374122
11 1991-01-01         B 15.091298
12 1992-01-01         B 39.626952
13 1993-01-01         B 21.039869
14 1994-01-01         B 44.267853
15 1995-01-01         B 47.082897
16 1996-01-01         B  3.232268
17 1997-01-01         B 26.877169
18 1998-01-01         B 44.728533
19 1999-01-01         B 28.020316
20 2000-01-01         B 23.374122

This should do what you want and your pivot_wider should work now:

&gt; DF %&gt;% 
+     pivot_wider(names_from = Parameter, values_from = Value)
# A tibble: 10 &#215; 3
   Date           A     B
   &lt;date&gt;     &lt;dbl&gt; &lt;dbl&gt;
 1 1991-01-01 15.1  15.1 
 2 1992-01-01 39.6  39.6 
 3 1993-01-01 21.0  21.0 
 4 1994-01-01 44.3  44.3 
 5 1995-01-01 47.1  47.1 
 6 1996-01-01  3.23  3.23
 7 1997-01-01 26.9  26.9 
 8 1998-01-01 44.7  44.7 
 9 1999-01-01 28.0  28.0 
10 2000-01-01 23.4  23.4 

答案2

得分: 1

尝试更改参数中的重复次数为 c(rep("A", times = 10), rep("B", times=10)),假设您的意图是每年有一个 "A" 和一个 "B"

与您的原始代码进行比较:

library(tidyverse)

set.seed(123)
DF <- data.frame(Date = seq(as.Date("1991-01-01"), 
                            to = as.Date("2000-12-31"), 
                            by = "year"), 
                 Parameter = rep(c("A","B"), times = 10),
                 Value = runif(10,1,50)) %>%
  pivot_wider(names_from = Parameter, values_from = Value)

DF

使用以下代码:

DF <- data.frame(Date = seq(as.Date("1991-01-01"), 
                            to = as.Date("2000-12-31"), 
                            by = "year"), 
                 Parameter = c(rep("A", times = 10), rep("B", times=10)),
                 Value = runif(10,1,50)) %>%
  pivot_wider(names_from = Parameter, values_from = Value)

DF

<sup>创建于2023年08月10日,使用 reprex v2.0.2</sup>

英文:

Try changing the repeat in the Parameter to c(rep(&quot;A&quot;, times = 10), rep(&quot;B&quot;, times=10)), assuming your intention is to have one &quot;A&quot; per year, and one &quot;B&quot; per year.

Compare your original:

library(tidyverse)

set.seed(123)
DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;), 
                            to = as.Date(&quot;2000-12-31&quot;), 
                            by = &quot;year&quot;), 
                 Parameter = rep(c(&quot;A&quot;,&quot;B&quot;), times = 10),
                 Value = runif(10,1,50)) %&gt;% 
  pivot_wider(names_from = Parameter, values_from = Value)

DF
#&gt; # A tibble: 10 &#215; 3
#&gt;    Date       A         B        
#&gt;    &lt;date&gt;     &lt;list&gt;    &lt;list&gt;   
#&gt;  1 1991-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
#&gt;  2 1992-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
#&gt;  3 1993-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
#&gt;  4 1994-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
#&gt;  5 1995-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
#&gt;  6 1996-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
#&gt;  7 1997-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
#&gt;  8 1998-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;
#&gt;  9 1999-01-01 &lt;dbl [2]&gt; &lt;NULL&gt;   
#&gt; 10 2000-01-01 &lt;NULL&gt;    &lt;dbl [2]&gt;

With:

DF &lt;- data.frame(Date = seq(as.Date(&quot;1991-01-01&quot;), 
                            to = as.Date(&quot;2000-12-31&quot;), 
                            by = &quot;year&quot;), 
                 Parameter = c(rep(&quot;A&quot;, times = 10), rep(&quot;B&quot;, times=10)),
                 Value = runif(10,1,50)) %&gt;% 
  pivot_wider(names_from = Parameter, values_from = Value)

DF
#&gt; # A tibble: 10 &#215; 3
#&gt;    Date           A     B
#&gt;    &lt;date&gt;     &lt;dbl&gt; &lt;dbl&gt;
#&gt;  1 1991-01-01 47.9  47.9 
#&gt;  2 1992-01-01 23.2  23.2 
#&gt;  3 1993-01-01 34.2  34.2 
#&gt;  4 1994-01-01 29.1  29.1 
#&gt;  5 1995-01-01  6.04  6.04
#&gt;  6 1996-01-01 45.1  45.1 
#&gt;  7 1997-01-01 13.1  13.1 
#&gt;  8 1998-01-01  3.06  3.06
#&gt;  9 1999-01-01 17.1  17.1 
#&gt; 10 2000-01-01 47.8  47.8

<sup>Created on 2023-08-10 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年8月10日 23:40:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76877305.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定