将响应计数的数据框扩展为响应的数据框

huangapple go评论67阅读模式
英文:

Expanding Data Frame of Response Counts into Data Frame of Responses

问题

我试图将一个包含特定响应计数的数据框扩展为一个包含实际响应的数据框(而不是按计数聚合的数据框)

示例数据:

set.seed(44)
data <- data.frame(ID = LETTERS[1:5]
, Count_1 = sample(c(3:8), 5)
, Count_2 = sample(c(3:8), 5)
, Count_3 = sample(c(3:8), 5))


它看起来像这样:

ID Count_1 Count_2 Count_3
1 A 3 8 3
2 B 8 7 4
3 C 7 4 8
4 D 4 3 5
5 E 5 5 6


我想将其扩展为如下所示:

desired <- rbind(data.frame(ID = c(rep("A", 3), rep("B", 8), rep("C", 7), rep("D", 4), rep("E", 5))
, Response = c(rep("1", 3), rep("1", 8) , rep("1", 7), rep("1", 4), rep("1", 5))
)
, data.frame(ID = c(rep("A", 8), rep("B", 7), rep("C", 4), rep("D", 3), rep("E", 5))
, Response = c(rep("2", 8), rep("2", 7) , rep("2", 4), rep("2", 3), rep("2", 5))
)
, data.frame(ID = c(rep("A", 3), rep("B", 4), rep("C", 8), rep("D", 5), rep("E", 6))
, Response = c(rep("2", 3), rep("2", 4) , rep("2", 8), rep("2", 5), rep("2", 6))
)
)


它看起来像:

ID Response
1 A 1
2 A 1
3 A 1
4 B 1
5 B 1
6 B 1
7 B 1
8 B 1
9 B 1
10 B 1
11 B 1
12 C 1
13 C 1
14 C 1
15 C 1
16 C 1
17 C 1
18 C 1
19 D 1
20 D 1
21 D 1
22 D 1
23 E 1
24 E 1
25 E 1
26 E 1
27 E 1
28 A 2
29 A 2
30 A 2
31 A 2
32 A 2
33 A 2
34 A 2
35 A 2
36 B 2
37 B 2
38 B 2
39 B 2
40 B 2
41 B 2
42 B 2
43 C 2
44 C 2
45 C 2
46 C 2
47 D 2
48 D 2
49 D 2
50 E 2
51 E 2
52 E 2
53 E 2
54 E 2
55 A 2
56 A 2
57 A 2
58 B 2
59 B 2
60 B 2
61 B 2
62 C 2
63 C 2
64 C 2
65 C 2
66 C 2
67 C 2
68 C 2
69 C 2
70 D 2
71 D 2
72 D 2
73 D 2
74 D 2
75 E 2
76 E 2
77 E 2
78 E 2
79 E 2
80 E 2


<details>
<summary>英文:</summary>

I am trying to expand a data frame of counts of specific responses into a data frame of actual responses (rather than aggregated by counts)

Example data:

set.seed(44)
data <- data.frame(ID = LETTERS[1:5]
, Count_1 = sample(c(3:8), 5)
, Count_2 = sample(c(3:8), 5)
, Count_3 = sample(c(3:8), 5))


Which looks like this:

ID Count_1 Count_2 Count_3
1 A 3 8 3
2 B 8 7 4
3 C 7 4 8
4 D 4 3 5
5 E 5 5 6


I want to expand this out to look like this:

desired <- rbind(data.frame(ID = c(rep("A", 3), rep("B", 8), rep("C", 7), rep("D", 4), rep("E", 5)
)
, Response = c(rep("1", 3), rep("1", 8) , rep("1", 7), rep("1", 4), rep("1", 5)
)
)
, data.frame(ID = c(rep("A", 8), rep("B", 7), rep("C", 4), rep("D", 3), rep("E", 5)
)

             , Response = c(rep(&quot;2&quot;, 8), rep(&quot;2&quot;, 7) , rep(&quot;2&quot;, 4), rep(&quot;2&quot;, 3), rep(&quot;2&quot;, 5)
             )
             )
             
             , data.frame(ID = c(rep(&quot;A&quot;, 3), rep(&quot;B&quot;, 4), rep(&quot;C&quot;, 8), rep(&quot;D&quot;, 5), rep(&quot;E&quot;, 6)
             )
             , Response = c(rep(&quot;2&quot;, 3), rep(&quot;2&quot;, 4) , rep(&quot;2&quot;, 8), rep(&quot;2&quot;, 5), rep(&quot;2&quot;, 6)
             )
             )
             
             )

Which looks like:

ID Response
1 A 1
2 A 1
3 A 1
4 B 1
5 B 1
6 B 1
7 B 1
8 B 1
9 B 1
10 B 1
11 B 1
12 C 1
13 C 1
14 C 1
15 C 1
16 C 1
17 C 1
18 C 1
19 D 1
20 D 1
21 D 1
22 D 1
23 E 1
24 E 1
25 E 1
26 E 1
27 E 1
28 A 2
29 A 2
30 A 2
31 A 2
32 A 2
33 A 2
34 A 2
35 A 2
36 B 2
37 B 2
38 B 2
39 B 2
40 B 2
41 B 2
42 B 2
43 C 2
44 C 2
45 C 2
46 C 2
47 D 2
48 D 2
49 D 2
50 E 2
51 E 2
52 E 2
53 E 2
54 E 2
55 A 2
56 A 2
57 A 2
58 B 2
59 B 2
60 B 2
61 B 2
62 C 2
63 C 2
64 C 2
65 C 2
66 C 2
67 C 2
68 C 2
69 C 2
70 D 2
71 D 2
72 D 2
73 D 2
74 D 2
75 E 2
76 E 2
77 E 2
78 E 2
79 E 2
80 E 2


I was thinking maybe a for loop to create vectors for each cell in each column, but I&#39;m wondering if that&#39;s necessary or if there might be a more efficient way?

</details>


# 答案1
**得分**: 1

We could reshape to 'long' format with `pivot_longer` and then use `uncount`

```R
library(dplyr)
library(tidyr)
data %>%
   pivot_longer(cols = starts_with("Count"), names_prefix = 'Count_', 
     names_to = 'Response') %>%
   arrange(Response) %>%
   uncount(value)

-output

# A tibble: 80 x 2
   ID    Response
   <chr> <chr>   
 1 A     1       
 2 A     1       
 3 A     1       
 4 B     1       
 5 B     1       
 6 B     1       
 7 B     1       
 8 B     1       
 9 B     1       
10 B     1       
# … with 70 more rows

In base R, this can be done with rep/unlist/col/row

vals <- unlist(data[-1])
out <-  data.frame(ID = rep(data$ID[row(data[-1])], vals), 
      Response = rep(col(data[-1]), vals))

data

data <- structure(list(ID = c("A", "B", "C", "D", "E"), Count_1 = c(3L, 
8L, 7L, 4L, 5L), Count_2 = c(8L, 7L, 4L, 3L, 5L), Count_3 = c(3L, 
4L, 8L, 5L, 6L)), class = "data.frame", row names = c("1", "2", 
"3", "4", "5"))
英文:

We could reshape to 'long' format with pivot_longer and then use uncount

library(dplyr)
library(tidyr)
data %&gt;%
   pivot_longer(cols = starts_with(&quot;Count&quot;), names_prefix = &#39;Count_&#39;, 
     names_to = &#39;Response&#39;) %&gt;% 
   arrange(Response) %&gt;%
   uncount(value)

-output

# A tibble: 80 &#215; 2
   ID    Response
   &lt;chr&gt; &lt;chr&gt;   
 1 A     1       
 2 A     1       
 3 A     1       
 4 B     1       
 5 B     1       
 6 B     1       
 7 B     1       
 8 B     1       
 9 B     1       
10 B     1       
# … with 70 more rows

In base R, this can be done with rep/unlist/col/row

vals &lt;- unlist(data[-1])
out &lt;-  data.frame(ID = rep(data$ID[row(data[-1])], vals), 
      Response = rep(col(data[-1]), vals))

data

data &lt;- structure(list(ID = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;), Count_1 = c(3L, 
8L, 7L, 4L, 5L), Count_2 = c(8L, 7L, 4L, 3L, 5L), Count_3 = c(3L, 
4L, 8L, 5L, 6L)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, &quot;2&quot;, 
&quot;3&quot;, &quot;4&quot;, &quot;5&quot;))

huangapple
  • 本文由 发表于 2023年2月10日 05:36:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404657.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定