英文:
Transform column data to row data R
问题
我有以下格式的数据:
# 可再现的示例
order <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 4, 5, 6, 7 ,8 ,9 )
values <- c(100, 74 , 70 , 88, 104 ,177 ,88, 189 , 75 , 58, 105, 171 , 29, 60 , 71 , 37 , 93, 99, 206 , 74 , 82 , 69 , 67, 102, 161 , 60 , 92 , 62 ,104, 34, 108, 53 , 50 ,80 , 70 , 77 , 76, 105 ,115 , 78)
journey_id <- c(1, 1, 1 ,1 ,1, 1, 1, 1, 1 ,1, 1, 1, 1, 1, 1 ,1 ,1 ,2, 2, 2 ,2 ,2 ,2, 2, 2, 2, 2, 2, 2, 2 ,2, 2, 2 ,2, 3 ,3 ,3 ,3 ,3 ,3)
df <- data.frame(order, values, journey_id)
其中 order
是沿途的一个站点,values
是观察到的该站点的值。我希望将其转换为基于旅程的数据,其中每一行应该是单个旅程的观察,列来自 order
的值。并不是所有旅程都一定对所有站点有观察值。
输出应该如下所示:
## 输出##
# 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, ..., 20
#journey1 100, 74, 70, 88, 104, 177, 88, 189, 75, 58, ..., 93
#journey2 99, 206, 74, 82, 69, 67, 102, 161, 60, 92, ..., 80
#journey3 70, 77, 76, 105, 115, 78, NA, NA, NA, NA, ..., NA
我的数据相当大,如果可能的话,我更愿意不要在数据框的行上循环,而是使用矢量化的解决方案。
与站点相关联的值并不在所有情况下对于单个旅程是唯一的。
英文:
I have data on the following format:
# Reproducible example
order <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 4, 5, 6, 7 ,8 ,9 )
values <- c(100, 74 , 70 , 88, 104 ,177 ,88, 189 , 75 , 58, 105, 171 , 29, 60 , 71 , 37 , 93, 99, 206 , 74 , 82 , 69 , 67, 102, 161 , 60 , 92 , 62 ,104, 34, 108, 53 , 50 ,80 , 70 , 77 , 76, 105 ,115 , 78)
journey_id <- c(1, 1, 1 ,1 ,1, 1, 1, 1, 1 ,1, 1, 1, 1, 1, 1 ,1 ,1 ,2, 2, 2 ,2 ,2 ,2, 2, 2, 2, 2, 2, 2, 2 ,2, 2, 2 ,2, 3 ,3 ,3 ,3 ,3 ,3)
df <- data.frame(order, values, journey_id)
Where order
refers to a stop along a route, and values
are observed values of that stop. I would transform this to journey based data, where each row should be an observation of a single journey, where the columns are the order
and the values are taken from values
. All journeys do not necessarily have an observed value for all the stops.
The output should look like this:
## OUTPUT##
# 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, ..., 20
#journey1 100, 74, 70, 88, 104, 177, 88, 189, 75, 58, ..., 93
#journey2 99, 206, 74, 82, 69, 67, 102, 161, 60, 92, ..., 80
#journey3 70, 77, 76, 105, 115, 78, NA, NA, NA, NA, ..., NA
My data is quite large, so if possible I'd prefer to not loop over the rows in the data frame, but rather use a vectorized solution.
The value associated with a stop is not in all cases unique for a single journey.
答案1
得分: 2
使用 pivot_wider
函数:
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = "order", values_from = "values")
这段代码的作用是使用 pivot_wider
函数将数据从长格式变为宽格式。
英文:
With pivot_wider
:
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = "order", values_from = "values")
journey_id 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 100 74 70 88 104 177 88 189 75 58 105 171 29 60 71 37 93
2 2 99 206 74 82 69 67 102 161 60 92 62 104 34 108 53 50 80
3 3 70 77 76 105 115 78 NA NA NA NA NA NA NA NA NA NA NA
答案2
得分: 0
使用xtabs
的一种基本方法:
xtabs(values ~ journey_id + order, df)
# order
#journey_id 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# 1 100 74 70 88 104 177 88 189 75 58 105 171 29 60 71 37 93
# 2 99 206 74 82 69 67 102 161 60 92 62 104 34 108 53 50 80
# 3 70 77 76 105 115 78 0 0 0 0 0 0 0 0 0 0 0
另一种方法使用reshape
:
reshape(df, direction = "wide", idvar = "journey_id", timevar = "order")
# journey_id values.4 values.5 values.6 values.7 values.8 values.9 values.10
#1 1 100 74 70 88 104 177 88
#18 2 99 206 74 82 69 67 102
#35 3 70 77 76 105 115 78 NA
# values.11 values.12 values.13 values.14 values.15 values.16 values.17
#1 189 75 58 105 171 29 60
#18 161 60 92 62 104 34 108
#35 NA NA NA NA NA NA NA
# values.18 values.19 values.20
#1 71 37 93
#18 53 50 80
#35 NA NA NA
英文:
A base possibility using xtabs
:
xtabs(values ~ journey_id + order, df)
# order
#journey_id 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# 1 100 74 70 88 104 177 88 189 75 58 105 171 29 60 71 37 93
# 2 99 206 74 82 69 67 102 161 60 92 62 104 34 108 53 50 80
# 3 70 77 76 105 115 78 0 0 0 0 0 0 0 0 0 0 0
Another using reshape
:
reshape(df, direction = "wide", idvar = "journey_id", timevar = "order")
# journey_id values.4 values.5 values.6 values.7 values.8 values.9 values.10
#1 1 100 74 70 88 104 177 88
#18 2 99 206 74 82 69 67 102
#35 3 70 77 76 105 115 78 NA
# values.11 values.12 values.13 values.14 values.15 values.16 values.17
#1 189 75 58 105 171 29 60
#18 161 60 92 62 104 34 108
#35 NA NA NA NA NA NA NA
# values.18 values.19 values.20
#1 71 37 93
#18 53 50 80
#35 NA NA NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论