如何按数字递增顺序对数据框进行排序?

huangapple go评论96阅读模式
英文:

How can I order a data frame so it increases numerically?

问题

我有下面的表格。我想按数字升序排序,从第2波到第31波。

  1. 波次 平均_B
  2. <chr> <dbl>
  3. 1 10 - 202012 5.49
  4. 2 11 - 20211 5.52
  5. 3 12 - 20212 5.52
  6. 4 13 - 20213 5.45
  7. 5 14 - 20214 5.53
  8. 6 15 - 20215 5.46
  9. 7 16 - 20216 5.51
  10. 8 17 - 20217 5.63
  11. 9 18 - 20218 5.54
  12. 10 19 - 20219 5.49
  13. 11 2 - 20204 5.67
  14. 12 20 - 202110 5.55
  15. 13 21 - 202111 5.43
  16. 14 22 - 202112 5.35
  17. 15 23 - 20221 5.46
  18. 16 24 - 20222 5.41
  19. 17 25 - 20223 5.30
  20. 18 26 - 20224 5.38
  21. 19 27 - 20225 5.39
  22. 20 28 - 20226 5.55
  23. 21 29 - 20227 5.51
  24. 22 3 - 20205 5.72
  25. 23 30 - 20228 5.52
  26. 24 31 - 20229 5.54
  27. 25 4 - 20206 5.62
  28. 26 5 - 20207 5.54
  29. 27 6 - 20208 5.61
  30. 28 7 - 20209 5.60
  31. 29 8 - 202010 5.54
  32. 30 9 - 202011 5.60

我创建了这个表格,希望能自动按照以下代码中的数字顺序排序:

  1. mean_values <- Trial %>%
  2. group_by(Wave) %>%
  3. summarise(mean_B = mean(Weight_Answer, na.rm = TRUE))

然后我尝试了下面的代码:

  1. mean_values <- Trial %>%
  2. group_by(Wave) %>%
  3. summarise(mean_B = mean(Weight_Answer, na.rm = TRUE)) %>%
  4. arrange(as.numeric(gsub("Wave ", "", Wave)))

但它出现了警告:

>警告信息:
arrange() 中有1个警告。
ℹ 参数:..1 = as.numeric(gsub("Wave ", "", Wave))
引起警告的原因:
! 强制转换引入了NAs。

英文:

I have the table below. I want to order it such that it goes from Wave 2 - Wave 31 in numeric ascending order.

  1. Wave mean_B
  2. <chr> <dbl>
  3. 1 Wave 10 - December 2020 5.49
  4. 2 Wave 11 - January 2021 5.52
  5. 3 Wave 12 - February 2021 5.52
  6. 4 Wave 13 - March 2021 5.45
  7. 5 Wave 14 - April 2021 5.53
  8. 6 Wave 15 - May 2021 5.46
  9. 7 Wave 16 - June 2021 5.51
  10. 8 Wave 17 - July 2021 5.63
  11. 9 Wave 18 - August 2021 5.54
  12. 10 Wave 19 - September 2021 5.49
  13. 11 Wave 2 - April 2020 5.67
  14. 12 Wave 20 - October 2021 5.55
  15. 13 Wave 21 - November 2021 5.43
  16. 14 Wave 22 - December 2021 5.35
  17. 15 Wave 23 - January 2022 5.46
  18. 16 Wave 24 - February 2022 5.41
  19. 17 Wave 25 - March 2022 5.30
  20. 18 Wave 26 - April 2022 5.38
  21. 19 Wave 27 - May 2022 5.39
  22. 20 Wave 28 - June 2022 5.55
  23. 21 Wave 29 - July 2022 5.51
  24. 22 Wave 3 - May 2020 5.72
  25. 23 Wave 30 - August 2022 5.52
  26. 24 Wave 31 - September 2022 5.54
  27. 25 Wave 4 - June 2020 5.62
  28. 26 Wave 5 - July 2020 5.54
  29. 27 Wave 6 - August 2020 5.61
  30. 28 Wave 7 - September 2020 5.60
  31. 29 Wave 8 - October 2020 5.54
  32. 30 Wave 9 - November 2020 5.60

I created the table expecting it to be ordered numerically automatically using the following code:

  1. mean_values <- Trial %>%
  2. group_by(Wave) %>%
  3. summarise(mean_B = mean(Weight_Answer, na.rm = TRUE))

I then tried to the code:

  1. mean_values <- Trial %>%
  2. group_by(Wave) %>%
  3. summarise(mean_B = mean(Weight_Answer, na.rm = TRUE)) %>%
  4. arrange(as.numeric(gsub("Wave ", "", Wave)))

However, it gave the warning:

>Warning message:
There was 1 warning in arrange().
ℹ In argument: ..1 = as.numeric(gsub("Wave ", "", Wave)).
Caused by warning:
! NAs introduced by coercion

I am quite new to R so not sure what this means and how to resolve it

答案1

得分: 1

你可以使用gtools包中的mixedsort函数。

  1. library(gtools)
  2. df[match(mixedsort(df$Wave), df$Wave),]

或者提取Wave数字并根据它进行排序。

  1. library(dplyr)
  2. df %>%
  3. arrange(as.integer(sub("Wave (\\d+).*", "\", Wave)))

输出

  1. Wave mean_B
  2. 11 Wave 2 - April 2020 5.67
  3. 22 Wave 3 - May 2020 5.72
  4. 25 Wave 4 - June 2020 5.62
  5. 26 Wave 5 - July 2020 5.54
  6. 27 Wave 6 - August 2020 5.61
  7. 28 Wave 7 - September 2020 5.60
  8. 29 Wave 8 - October 2020 5.54
  9. 30 Wave 9 - November 2020 5.60
  10. 1 Wave 10 - December 2020 5.49
  11. 2 Wave 11 - January 2021 5.52
  12. 3 Wave 12 - February 2021 5.52
  13. 4 Wave 13 - March 2021 5.45
  14. 5 Wave 14 - April 2021 5.53
  15. 6 Wave 15 - May 2021 5.46
  16. 7 Wave 16 - June 2021 5.51
  17. 8 Wave 17 - July 2021 5.63
  18. 9 Wave 18 - August 2021 5.54
  19. 10 Wave 19 - September 2021 5.49
  20. 12 Wave 20 - October 2021 5.55
  21. 13 Wave 21 - November 2021 5.43
  22. 14 Wave 22 - December 2021 5.35
  23. 15 Wave 23 - January 2022 5.46
  24. 16 Wave 24 - February 2022 5.41
  25. 17 Wave 25 - March 2022 5.30
  26. 18 Wave 26 - April 2022 5.38
  27. 19 Wave 27 - May 2022 5.39
  28. 20 Wave 28 - June 2022 5.55
  29. 21 Wave 29 - July 2022 5.51
  30. 23 Wave 30 - August 2022 5.52
  31. 24 Wave 31 - September 2022 5.54

数据

  1. df <- structure(list(Wave = c("Wave 10 - December 2020", "Wave 11 - January 2021",
  2. "Wave 12 - February 2021", "Wave 13 - March 2021", "Wave 14 - April 2021",
  3. "Wave 15 - May 2021", "Wave 16 - June 2021", "Wave 17 - July 2021",
  4. "Wave 18 - August 2021", "Wave 19 - September 2021", "Wave 2 - April 2020",
  5. "Wave 20 - October 2021", "Wave 21 - November 2021", "Wave 22 - December 2021",
  6. "Wave 23 - January 2022", "Wave 24 - February 2022", "Wave 25 - March 2022",
  7. "Wave 26 - April 2022", "Wave 27 - May 2022", "Wave 28 - June 2022",
  8. "Wave 29 - July 2022", "Wave 3 - May 2020", "Wave 30 - August 2022",
  9. "Wave 31 - September 2022", "Wave 4 - June 2020", "Wave 5 - July 2020",
  10. "Wave 6 - August 2020", "Wave 7 - September 2020", "Wave 8 - October 2020",
  11. "Wave 9 - November 2020"), mean_B = c(5.49, 5.52, 5.52, 5.45,
  12. 5.53, 5.46, 5.51, 5.63, 5.54, 5.49, 5.67, 5.55, 5.43, 5.35, 5.46,
  13. 5.41, 5.3, 5.38, 5.39, 5.55, 5.51, 5.72, 5.52, 5.54, 5.62, 5.54,
  14. 5.61, 5.6, 5.54, 5.6)), class = "data.frame", row.names = c(NA,
  15. -30L))
英文:

You can use mixedsort from the gtools package.

  1. library(gtools)
  2. df[match(mixedsort(df$Wave), df$Wave),]

Or to extract the Wave number and arrange on that.

  1. library(dplyr)
  2. df |&gt;
  3. arrange(as.integer(sub(&quot;Wave (\\d+).*&quot;, &quot;\&quot;, Wave)))

Output

  1. Wave mean_B
  2. 11 Wave 2 - April 2020 5.67
  3. 22 Wave 3 - May 2020 5.72
  4. 25 Wave 4 - June 2020 5.62
  5. 26 Wave 5 - July 2020 5.54
  6. 27 Wave 6 - August 2020 5.61
  7. 28 Wave 7 - September 2020 5.60
  8. 29 Wave 8 - October 2020 5.54
  9. 30 Wave 9 - November 2020 5.60
  10. 1 Wave 10 - December 2020 5.49
  11. 2 Wave 11 - January 2021 5.52
  12. 3 Wave 12 - February 2021 5.52
  13. 4 Wave 13 - March 2021 5.45
  14. 5 Wave 14 - April 2021 5.53
  15. 6 Wave 15 - May 2021 5.46
  16. 7 Wave 16 - June 2021 5.51
  17. 8 Wave 17 - July 2021 5.63
  18. 9 Wave 18 - August 2021 5.54
  19. 10 Wave 19 - September 2021 5.49
  20. 12 Wave 20 - October 2021 5.55
  21. 13 Wave 21 - November 2021 5.43
  22. 14 Wave 22 - December 2021 5.35
  23. 15 Wave 23 - January 2022 5.46
  24. 16 Wave 24 - February 2022 5.41
  25. 17 Wave 25 - March 2022 5.30
  26. 18 Wave 26 - April 2022 5.38
  27. 19 Wave 27 - May 2022 5.39
  28. 20 Wave 28 - June 2022 5.55
  29. 21 Wave 29 - July 2022 5.51
  30. 23 Wave 30 - August 2022 5.52
  31. 24 Wave 31 - September 2022 5.54

Data

  1. df &lt;- structure(list(Wave = c(&quot;Wave 10 - December 2020&quot;, &quot;Wave 11 - January 2021&quot;,
  2. &quot;Wave 12 - February 2021&quot;, &quot;Wave 13 - March 2021&quot;, &quot;Wave 14 - April 2021&quot;,
  3. &quot;Wave 15 - May 2021&quot;, &quot;Wave 16 - June 2021&quot;, &quot;Wave 17 - July 2021&quot;,
  4. &quot;Wave 18 - August 2021&quot;, &quot;Wave 19 - September 2021&quot;, &quot;Wave 2 - April 2020&quot;,
  5. &quot;Wave 20 - October 2021&quot;, &quot;Wave 21 - November 2021&quot;, &quot;Wave 22 - December 2021&quot;,
  6. &quot;Wave 23 - January 2022&quot;, &quot;Wave 24 - February 2022&quot;, &quot;Wave 25 - March 2022&quot;,
  7. &quot;Wave 26 - April 2022&quot;, &quot;Wave 27 - May 2022&quot;, &quot;Wave 28 - June 2022&quot;,
  8. &quot;Wave 29 - July 2022&quot;, &quot;Wave 3 - May 2020&quot;, &quot;Wave 30 - August 2022&quot;,
  9. &quot;Wave 31 - September 2022&quot;, &quot;Wave 4 - June 2020&quot;, &quot;Wave 5 - July 2020&quot;,
  10. &quot;Wave 6 - August 2020&quot;, &quot;Wave 7 - September 2020&quot;, &quot;Wave 8 - October 2020&quot;,
  11. &quot;Wave 9 - November 2020&quot;), mean_B = c(5.49, 5.52, 5.52, 5.45,
  12. 5.53, 5.46, 5.51, 5.63, 5.54, 5.49, 5.67, 5.55, 5.43, 5.35, 5.46,
  13. 5.41, 5.3, 5.38, 5.39, 5.55, 5.51, 5.72, 5.52, 5.54, 5.62, 5.54,
  14. 5.61, 5.6, 5.54, 5.6)), class = &quot;data.frame&quot;, row.names = c(NA,
  15. -30L))

答案2

得分: 1

@benson23 给出了一个不错的答案。以下是一个 "base R" 的答案。

  1. # 提取波数
  2. waveNumbers <- sub("Wave (\\d+) - \\w+ \\d+", "\", dat$Wave) |> as.numeric()
  3. # 进行相应的排列
  4. dat2 <- dat[order(waveNumbers), ]

请注意,行的 "names"(1, 2, ...)也被排列了。你可以使用 rownames(dat2) <- NULL 来避免这种情况。

英文:

@benson23 gave a nice answer. Here is a "base R" answer.

  1. # extract wave numbers
  2. waveNumbers &lt;- sub(&quot;Wave (\\d+) \\- \\w+ \\d+&quot;, &quot;\&quot;, dat$Wave) |&gt; as.numeric()
  3. # permute accordingly
  4. dat2 &lt;- dat[order(waveNumbers), ]

Note that the row "names" (1, 2, ...) are permuted too. You can do rownames(dat2) &lt;- NULL to avoid that.

答案3

得分: 0

在你已经使用tidyverse的情况下,我们可以使用readr::parse_number

  1. readr::parse_number("Wave 9 - November 2020 5.60")
  2. # [1] 9

这很容易添加到你的dplyr管道中:

  1. library(dplyr)
  2. df %>%
  3. arrange(readr::parse_number(Wave))
  4. # Wave mean_B
  5. # 1 Wave 2 - April 2020 5.67
  6. # 2 Wave 3 - May 2020 5.72
  7. # 3 Wave 4 - June 2020 5.62
  8. # 4 Wave 5 - July 2020 5.54
  9. # 5 Wave 6 - August 2020 5.61
  10. # 6 Wave 7 - September 2020 5.60
  11. # 7 Wave 8 - October 2020 5.54
  12. # 8 Wave 9 - November 2020 5.60
  13. # 9 Wave 10 - December 2020 5.49
  14. # 10 Wave 11 - January 2021 5.52
  15. # 11 Wave 12 - February 2021 5.52
  16. # 12 Wave 13 - March 2021 5.45
  17. # 13 Wave 14 - April 2021 5.53
  18. # 14 Wave 15 - May 2021 5.46
  19. # 15 Wave 16 - June 2021 5.51
  20. # 16 Wave 17 - July 2021 5.63
  21. # 17 Wave 18 - August 2021 5.54
  22. # 18 Wave 19 - September 2021 5.49
  23. # 19 Wave 20 - October 2021 5.55
  24. # 20 Wave 21 - November 2021 5.43
  25. # 21 Wave 22 - December 2021 5.35
  26. # 22 Wave 23 - January 2022 5.46
  27. # 23 Wave 24 - February 2022 5.41
  28. # 24 Wave 25 - March 2022 5.30
  29. # 25 Wave 26 - April 2022 5.38
  30. # 26 Wave 27 - May 2022 5.39
  31. # 27 Wave 28 - June 2022 5.55
  32. # 28 Wave 29 - July 2022 5.51
  33. # 29 Wave 30 - August 2022 5.52
  34. # 30 Wave 31 - September 2022 5.54
英文:

Since you're already in the tidyverse, we can use readr::parse_number:

  1. readr::parse_number(&quot;Wave 9 - November 2020 5.60&quot;)
  2. # [1] 9

This is easy enough to add to your dplyr pipe:

  1. library(dplyr)
  2. df %&gt;%
  3. arrange(readr::parse_number(Wave))
  4. # Wave mean_B
  5. # 1 Wave 2 - April 2020 5.67
  6. # 2 Wave 3 - May 2020 5.72
  7. # 3 Wave 4 - June 2020 5.62
  8. # 4 Wave 5 - July 2020 5.54
  9. # 5 Wave 6 - August 2020 5.61
  10. # 6 Wave 7 - September 2020 5.60
  11. # 7 Wave 8 - October 2020 5.54
  12. # 8 Wave 9 - November 2020 5.60
  13. # 9 Wave 10 - December 2020 5.49
  14. # 10 Wave 11 - January 2021 5.52
  15. # 11 Wave 12 - February 2021 5.52
  16. # 12 Wave 13 - March 2021 5.45
  17. # 13 Wave 14 - April 2021 5.53
  18. # 14 Wave 15 - May 2021 5.46
  19. # 15 Wave 16 - June 2021 5.51
  20. # 16 Wave 17 - July 2021 5.63
  21. # 17 Wave 18 - August 2021 5.54
  22. # 18 Wave 19 - September 2021 5.49
  23. # 19 Wave 20 - October 2021 5.55
  24. # 20 Wave 21 - November 2021 5.43
  25. # 21 Wave 22 - December 2021 5.35
  26. # 22 Wave 23 - January 2022 5.46
  27. # 23 Wave 24 - February 2022 5.41
  28. # 24 Wave 25 - March 2022 5.30
  29. # 25 Wave 26 - April 2022 5.38
  30. # 26 Wave 27 - May 2022 5.39
  31. # 27 Wave 28 - June 2022 5.55
  32. # 28 Wave 29 - July 2022 5.51
  33. # 29 Wave 30 - August 2022 5.52
  34. # 30 Wave 31 - September 2022 5.54

huangapple
  • 本文由 发表于 2023年5月24日 17:50:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76322208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定