将列数据转换为行数据 R

huangapple go评论64阅读模式
英文:

Transform column data to row data R

问题

我有以下格式的数据:

# 可再现的示例
order <- c(4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,  4,  5,  6, 7 ,8 ,9 )

values <- c(100,  74 , 70 , 88, 104 ,177  ,88, 189 , 75 , 58, 105, 171 , 29,  60 , 71 , 37 , 93,  99, 206 , 74 , 82 , 69 , 67, 102, 161 , 60 , 92 , 62 ,104, 34, 108,  53 , 50  ,80 , 70 , 77 , 76, 105 ,115 , 78)

journey_id <- c(1, 1, 1 ,1 ,1, 1, 1, 1, 1 ,1, 1, 1, 1, 1, 1 ,1 ,1 ,2, 2, 2 ,2 ,2 ,2, 2, 2, 2, 2, 2, 2, 2 ,2, 2, 2 ,2, 3 ,3 ,3 ,3 ,3 ,3)

df <- data.frame(order, values, journey_id)

其中 order 是沿途的一个站点,values 是观察到的该站点的值。我希望将其转换为基于旅程的数据,其中每一行应该是单个旅程的观察,列来自 order 的值。并不是所有旅程都一定对所有站点有观察值。

输出应该如下所示:

## 输出## 

#         1 ,   2,   3,    4,    5,    6,    7,   8,    9,   10,  ..., 20
#journey1 100,  74,  70,   88,   104,  177,  88,  189,  75,  58,  ..., 93
#journey2 99,   206, 74,   82,   69,   67,   102, 161,  60,  92,  ..., 80
#journey3 70,   77,  76,   105,  115,  78,   NA,  NA,   NA,  NA,  ..., NA

我的数据相当大,如果可能的话,我更愿意不要在数据框的行上循环,而是使用矢量化的解决方案。

与站点相关联的值并不在所有情况下对于单个旅程是唯一的。

英文:

I have data on the following format:

# Reproducible example
order <- c(4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,  4,  5,  6, 7 ,8 ,9 )

values <- c(100,  74 , 70 , 88, 104 ,177  ,88, 189 , 75 , 58, 105, 171 , 29,  60 , 71 , 37 , 93,  99, 206 , 74 , 82 , 69 , 67, 102, 161 , 60 , 92 , 62 ,104, 34, 108,  53 , 50  ,80 , 70 , 77 , 76, 105 ,115 , 78)

journey_id <- c(1, 1, 1 ,1 ,1, 1, 1, 1, 1 ,1, 1, 1, 1, 1, 1 ,1 ,1 ,2, 2, 2 ,2 ,2 ,2, 2, 2, 2, 2, 2, 2, 2 ,2, 2, 2 ,2, 3 ,3 ,3 ,3 ,3 ,3)

df <- data.frame(order, values, journey_id)

Where order refers to a stop along a route, and values are observed values of that stop. I would transform this to journey based data, where each row should be an observation of a single journey, where the columns are the order and the values are taken from values. All journeys do not necessarily have an observed value for all the stops.

The output should look like this:

## OUTPUT## 

#         1 ,   2,   3,    4,    5,    6,    7,   8,    9,   10,  ..., 20
#journey1 100,  74,  70,   88,   104,  177,  88,  189,  75,  58,  ..., 93
#journey2 99,   206, 74,   82,   69,   67,   102, 161,  60,  92,  ..., 80
#journey3 70,   77,  76,   105,  115,  78,   NA,  NA,   NA,  NA,  ..., NA

My data is quite large, so if possible I'd prefer to not loop over the rows in the data frame, but rather use a vectorized solution.

The value associated with a stop is not in all cases unique for a single journey.

答案1

得分: 2

使用 pivot_wider 函数:

library(tidyr)
library(dplyr)
df %>%
  pivot_wider(names_from = "order", values_from = "values")

这段代码的作用是使用 pivot_wider 函数将数据从长格式变为宽格式。

英文:

With pivot_wider:

library(tidyr)
library(dplyr)
df %>% 
  pivot_wider(names_from = "order", values_from = "values")

  journey_id   4   5  6   7   8   9  10  11 12 13  14  15 16  17 18 19 20
1          1 100  74 70  88 104 177  88 189 75 58 105 171 29  60 71 37 93
2          2  99 206 74  82  69  67 102 161 60 92  62 104 34 108 53 50 80
3          3  70  77 76 105 115  78  NA  NA NA NA  NA  NA NA  NA NA NA NA

答案2

得分: 0

使用xtabs的一种基本方法:

xtabs(values ~ journey_id + order, df)
#          order
#journey_id   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
#         1 100  74  70  88 104 177  88 189  75  58 105 171  29  60  71  37  93
#         2  99 206  74  82  69  67 102 161  60  92  62 104  34 108  53  50  80
#         3  70  77  76 105 115  78   0   0   0   0   0   0   0   0   0   0   0

另一种方法使用reshape

reshape(df, direction = "wide", idvar = "journey_id", timevar = "order")
#   journey_id values.4 values.5 values.6 values.7 values.8 values.9 values.10
#1           1      100       74       70       88      104      177        88
#18          2       99      206       74       82       69       67       102
#35          3       70       77       76      105      115       78        NA
#   values.11 values.12 values.13 values.14 values.15 values.16 values.17
#1        189        75        58       105       171        29        60
#18       161        60        92        62       104        34       108
#35        NA        NA        NA        NA        NA        NA        NA
#   values.18 values.19 values.20
#1         71        37        93
#18        53        50        80
#35        NA        NA        NA
英文:

A base possibility using xtabs:

xtabs(values ~ journey_id + order, df)
#          order
#journey_id   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
#         1 100  74  70  88 104 177  88 189  75  58 105 171  29  60  71  37  93
#         2  99 206  74  82  69  67 102 161  60  92  62 104  34 108  53  50  80
#         3  70  77  76 105 115  78   0   0   0   0   0   0   0   0   0   0   0

Another using reshape:

reshape(df, direction = "wide", idvar = "journey_id", timevar = "order")
#   journey_id values.4 values.5 values.6 values.7 values.8 values.9 values.10
#1           1      100       74       70       88      104      177        88
#18          2       99      206       74       82       69       67       102
#35          3       70       77       76      105      115       78        NA
#   values.11 values.12 values.13 values.14 values.15 values.16 values.17
#1        189        75        58       105       171        29        60
#18       161        60        92        62       104        34       108
#35        NA        NA        NA        NA        NA        NA        NA
#   values.18 values.19 values.20
#1         71        37        93
#18        53        50        80
#35        NA        NA        NA

huangapple
  • 本文由 发表于 2023年3月7日 18:43:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660957.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定