如何按数字递增顺序对数据框进行排序?

huangapple go评论75阅读模式
英文:

How can I order a data frame so it increases numerically?

问题

我有下面的表格。我想按数字升序排序,从第2波到第31波。

 波次                     平均_B
   <chr>                     <dbl>
 1 第10波 - 2020年12月    5.49 
 2 第11波 - 2021年1月     5.52
 3 第12波 - 2021年2月    5.52
 4 第13波 - 2021年3月       5.45
 5 第14波 - 2021年4月       5.53
 6 第15波 - 2021年5月         5.46
 7 第16波 - 2021年6月        5.51
 8 第17波 - 2021年7月        5.63
 9 第18波 - 2021年8月      5.54
10 第19波 - 2021年9月   5.49
11 第2波 - 2020年4月        5.67
12 第20波 - 2021年10月     5.55
13 第21波 - 2021年11月    5.43
14 第22波 - 2021年12月    5.35
15 第23波 - 2022年1月     5.46
16 第24波 - 2022年2月    5.41
17 第25波 - 2022年3月       5.30
18 第26波 - 2022年4月       5.38
19 第27波 - 2022年5月         5.39
20 第28波 - 2022年6月        5.55
21 第29波 - 2022年7月        5.51
22 第3波 - 2020年5月          5.72
23 第30波 - 2022年8月      5.52
24 第31波 - 2022年9月   5.54
25 第4波 - 2020年6月         5.62
26 第5波 - 2020年7月         5.54
27 第6波 - 2020年8月       5.61
28 第7波 - 2020年9月    5.60
29 第8波 - 2020年10月      5.54
30 第9波 - 2020年11月     5.60

我创建了这个表格,希望能自动按照以下代码中的数字顺序排序:

mean_values <- Trial %>% 
  group_by(Wave) %>% 
  summarise(mean_B = mean(Weight_Answer, na.rm = TRUE))

然后我尝试了下面的代码:

mean_values <- Trial %>%
  group_by(Wave) %>%
  summarise(mean_B = mean(Weight_Answer, na.rm = TRUE)) %>%
  arrange(as.numeric(gsub("Wave ", "", Wave)))

但它出现了警告:

>警告信息:
arrange() 中有1个警告。
ℹ 参数:..1 = as.numeric(gsub("Wave ", "", Wave))
引起警告的原因:
! 强制转换引入了NAs。

英文:

I have the table below. I want to order it such that it goes from Wave 2 - Wave 31 in numeric ascending order.

 Wave                     mean_B
   <chr>                     <dbl>
 1 Wave 10 - December 2020    5.49 
 2 Wave 11 - January 2021     5.52
 3 Wave 12 - February 2021    5.52
 4 Wave 13 - March 2021       5.45
 5 Wave 14 - April 2021       5.53
 6 Wave 15 - May 2021         5.46
 7 Wave 16 - June 2021        5.51
 8 Wave 17 - July 2021        5.63
 9 Wave 18 - August 2021      5.54
10 Wave 19 - September 2021   5.49
11 Wave 2 - April 2020        5.67
12 Wave 20 - October 2021     5.55
13 Wave 21 - November 2021    5.43
14 Wave 22 - December 2021    5.35
15 Wave 23 - January 2022     5.46
16 Wave 24 - February 2022    5.41
17 Wave 25 - March 2022       5.30
18 Wave 26 - April 2022       5.38
19 Wave 27 - May 2022         5.39
20 Wave 28 - June 2022        5.55
21 Wave 29 - July 2022        5.51
22 Wave 3 - May 2020          5.72
23 Wave 30 - August 2022      5.52
24 Wave 31 - September 2022   5.54
25 Wave 4 - June 2020         5.62
26 Wave 5 - July 2020         5.54
27 Wave 6 - August 2020       5.61
28 Wave 7 - September 2020    5.60
29 Wave 8 - October 2020      5.54
30 Wave 9 - November 2020     5.60

I created the table expecting it to be ordered numerically automatically using the following code:

mean_values <- Trial %>% 
  group_by(Wave) %>%
  summarise(mean_B = mean(Weight_Answer, na.rm = TRUE))

I then tried to the code:

mean_values <- Trial %>%
  group_by(Wave) %>%
  summarise(mean_B = mean(Weight_Answer, na.rm = TRUE)) %>%
  arrange(as.numeric(gsub("Wave ", "", Wave)))

However, it gave the warning:

>Warning message:
There was 1 warning in arrange().
ℹ In argument: ..1 = as.numeric(gsub("Wave ", "", Wave)).
Caused by warning:
! NAs introduced by coercion

I am quite new to R so not sure what this means and how to resolve it

答案1

得分: 1

你可以使用gtools包中的mixedsort函数。

library(gtools)

df[match(mixedsort(df$Wave), df$Wave),]

或者提取Wave数字并根据它进行排序。

library(dplyr)

df %>%
  arrange(as.integer(sub("Wave (\\d+).*", "\", Wave)))

输出

                       Wave mean_B
11      Wave 2 - April 2020   5.67
22        Wave 3 - May 2020   5.72
25       Wave 4 - June 2020   5.62
26       Wave 5 - July 2020   5.54
27     Wave 6 - August 2020   5.61
28  Wave 7 - September 2020   5.60
29    Wave 8 - October 2020   5.54
30   Wave 9 - November 2020   5.60
1   Wave 10 - December 2020   5.49
2    Wave 11 - January 2021   5.52
3   Wave 12 - February 2021   5.52
4      Wave 13 - March 2021   5.45
5      Wave 14 - April 2021   5.53
6        Wave 15 - May 2021   5.46
7       Wave 16 - June 2021   5.51
8       Wave 17 - July 2021   5.63
9     Wave 18 - August 2021   5.54
10 Wave 19 - September 2021   5.49
12   Wave 20 - October 2021   5.55
13  Wave 21 - November 2021   5.43
14  Wave 22 - December 2021   5.35
15   Wave 23 - January 2022   5.46
16  Wave 24 - February 2022   5.41
17     Wave 25 - March 2022   5.30
18     Wave 26 - April 2022   5.38
19       Wave 27 - May 2022   5.39
20      Wave 28 - June 2022   5.55
21      Wave 29 - July 2022   5.51
23    Wave 30 - August 2022   5.52
24 Wave 31 - September 2022   5.54

数据

df <- structure(list(Wave = c("Wave 10 - December 2020", "Wave 11 - January 2021", 
"Wave 12 - February 2021", "Wave 13 - March 2021", "Wave 14 - April 2021", 
"Wave 15 - May 2021", "Wave 16 - June 2021", "Wave 17 - July 2021", 
"Wave 18 - August 2021", "Wave 19 - September 2021", "Wave 2 - April 2020", 
"Wave 20 - October 2021", "Wave 21 - November 2021", "Wave 22 - December 2021", 
"Wave 23 - January 2022", "Wave 24 - February 2022", "Wave 25 - March 2022", 
"Wave 26 - April 2022", "Wave 27 - May 2022", "Wave 28 - June 2022", 
"Wave 29 - July 2022", "Wave 3 - May 2020", "Wave 30 - August 2022", 
"Wave 31 - September 2022", "Wave 4 - June 2020", "Wave 5 - July 2020", 
"Wave 6 - August 2020", "Wave 7 - September 2020", "Wave 8 - October 2020", 
"Wave 9 - November 2020"), mean_B = c(5.49, 5.52, 5.52, 5.45, 
5.53, 5.46, 5.51, 5.63, 5.54, 5.49, 5.67, 5.55, 5.43, 5.35, 5.46, 
5.41, 5.3, 5.38, 5.39, 5.55, 5.51, 5.72, 5.52, 5.54, 5.62, 5.54, 
5.61, 5.6, 5.54, 5.6)), class = "data.frame", row.names = c(NA, 
-30L))
英文:

You can use mixedsort from the gtools package.

library(gtools)

df[match(mixedsort(df$Wave), df$Wave),]

Or to extract the Wave number and arrange on that.

library(dplyr)

df |&gt; 
  arrange(as.integer(sub(&quot;Wave (\\d+).*&quot;, &quot;\&quot;, Wave)))

Output

                       Wave mean_B
11      Wave 2 - April 2020   5.67
22        Wave 3 - May 2020   5.72
25       Wave 4 - June 2020   5.62
26       Wave 5 - July 2020   5.54
27     Wave 6 - August 2020   5.61
28  Wave 7 - September 2020   5.60
29    Wave 8 - October 2020   5.54
30   Wave 9 - November 2020   5.60
1   Wave 10 - December 2020   5.49
2    Wave 11 - January 2021   5.52
3   Wave 12 - February 2021   5.52
4      Wave 13 - March 2021   5.45
5      Wave 14 - April 2021   5.53
6        Wave 15 - May 2021   5.46
7       Wave 16 - June 2021   5.51
8       Wave 17 - July 2021   5.63
9     Wave 18 - August 2021   5.54
10 Wave 19 - September 2021   5.49
12   Wave 20 - October 2021   5.55
13  Wave 21 - November 2021   5.43
14  Wave 22 - December 2021   5.35
15   Wave 23 - January 2022   5.46
16  Wave 24 - February 2022   5.41
17     Wave 25 - March 2022   5.30
18     Wave 26 - April 2022   5.38
19       Wave 27 - May 2022   5.39
20      Wave 28 - June 2022   5.55
21      Wave 29 - July 2022   5.51
23    Wave 30 - August 2022   5.52
24 Wave 31 - September 2022   5.54

Data

df &lt;- structure(list(Wave = c(&quot;Wave 10 - December 2020&quot;, &quot;Wave 11 - January 2021&quot;, 
&quot;Wave 12 - February 2021&quot;, &quot;Wave 13 - March 2021&quot;, &quot;Wave 14 - April 2021&quot;, 
&quot;Wave 15 - May 2021&quot;, &quot;Wave 16 - June 2021&quot;, &quot;Wave 17 - July 2021&quot;, 
&quot;Wave 18 - August 2021&quot;, &quot;Wave 19 - September 2021&quot;, &quot;Wave 2 - April 2020&quot;, 
&quot;Wave 20 - October 2021&quot;, &quot;Wave 21 - November 2021&quot;, &quot;Wave 22 - December 2021&quot;, 
&quot;Wave 23 - January 2022&quot;, &quot;Wave 24 - February 2022&quot;, &quot;Wave 25 - March 2022&quot;, 
&quot;Wave 26 - April 2022&quot;, &quot;Wave 27 - May 2022&quot;, &quot;Wave 28 - June 2022&quot;, 
&quot;Wave 29 - July 2022&quot;, &quot;Wave 3 - May 2020&quot;, &quot;Wave 30 - August 2022&quot;, 
&quot;Wave 31 - September 2022&quot;, &quot;Wave 4 - June 2020&quot;, &quot;Wave 5 - July 2020&quot;, 
&quot;Wave 6 - August 2020&quot;, &quot;Wave 7 - September 2020&quot;, &quot;Wave 8 - October 2020&quot;, 
&quot;Wave 9 - November 2020&quot;), mean_B = c(5.49, 5.52, 5.52, 5.45, 
5.53, 5.46, 5.51, 5.63, 5.54, 5.49, 5.67, 5.55, 5.43, 5.35, 5.46, 
5.41, 5.3, 5.38, 5.39, 5.55, 5.51, 5.72, 5.52, 5.54, 5.62, 5.54, 
5.61, 5.6, 5.54, 5.6)), class = &quot;data.frame&quot;, row.names = c(NA, 
-30L))

答案2

得分: 1

@benson23 给出了一个不错的答案。以下是一个 "base R" 的答案。

# 提取波数
waveNumbers <- sub("Wave (\\d+) - \\w+ \\d+", "\", dat$Wave) |> as.numeric()
# 进行相应的排列
dat2 <- dat[order(waveNumbers), ]

请注意,行的 "names"(1, 2, ...)也被排列了。你可以使用 rownames(dat2) <- NULL 来避免这种情况。

英文:

@benson23 gave a nice answer. Here is a "base R" answer.

# extract wave numbers
waveNumbers &lt;- sub(&quot;Wave (\\d+) \\- \\w+ \\d+&quot;, &quot;\&quot;, dat$Wave) |&gt; as.numeric()
# permute accordingly
dat2 &lt;- dat[order(waveNumbers), ]

Note that the row "names" (1, 2, ...) are permuted too. You can do rownames(dat2) &lt;- NULL to avoid that.

答案3

得分: 0

在你已经使用tidyverse的情况下,我们可以使用readr::parse_number

readr::parse_number("Wave 9 - November 2020     5.60")
# [1] 9

这很容易添加到你的dplyr管道中:

library(dplyr)
df %>%
  arrange(readr::parse_number(Wave))
#                        Wave mean_B
# 1       Wave 2 - April 2020   5.67
# 2         Wave 3 - May 2020   5.72
# 3        Wave 4 - June 2020   5.62
# 4        Wave 5 - July 2020   5.54
# 5      Wave 6 - August 2020   5.61
# 6   Wave 7 - September 2020   5.60
# 7     Wave 8 - October 2020   5.54
# 8    Wave 9 - November 2020   5.60
# 9   Wave 10 - December 2020   5.49
# 10   Wave 11 - January 2021   5.52
# 11  Wave 12 - February 2021   5.52
# 12     Wave 13 - March 2021   5.45
# 13     Wave 14 - April 2021   5.53
# 14       Wave 15 - May 2021   5.46
# 15      Wave 16 - June 2021   5.51
# 16      Wave 17 - July 2021   5.63
# 17    Wave 18 - August 2021   5.54
# 18 Wave 19 - September 2021   5.49
# 19   Wave 20 - October 2021   5.55
# 20  Wave 21 - November 2021   5.43
# 21  Wave 22 - December 2021   5.35
# 22   Wave 23 - January 2022   5.46
# 23  Wave 24 - February 2022   5.41
# 24     Wave 25 - March 2022   5.30
# 25     Wave 26 - April 2022   5.38
# 26       Wave 27 - May 2022   5.39
# 27      Wave 28 - June 2022   5.55
# 28      Wave 29 - July 2022   5.51
# 29    Wave 30 - August 2022   5.52
# 30 Wave 31 - September 2022   5.54
英文:

Since you're already in the tidyverse, we can use readr::parse_number:

readr::parse_number(&quot;Wave 9 - November 2020     5.60&quot;)
# [1] 9

This is easy enough to add to your dplyr pipe:

library(dplyr)
df %&gt;%
  arrange(readr::parse_number(Wave))
#                        Wave mean_B
# 1       Wave 2 - April 2020   5.67
# 2         Wave 3 - May 2020   5.72
# 3        Wave 4 - June 2020   5.62
# 4        Wave 5 - July 2020   5.54
# 5      Wave 6 - August 2020   5.61
# 6   Wave 7 - September 2020   5.60
# 7     Wave 8 - October 2020   5.54
# 8    Wave 9 - November 2020   5.60
# 9   Wave 10 - December 2020   5.49
# 10   Wave 11 - January 2021   5.52
# 11  Wave 12 - February 2021   5.52
# 12     Wave 13 - March 2021   5.45
# 13     Wave 14 - April 2021   5.53
# 14       Wave 15 - May 2021   5.46
# 15      Wave 16 - June 2021   5.51
# 16      Wave 17 - July 2021   5.63
# 17    Wave 18 - August 2021   5.54
# 18 Wave 19 - September 2021   5.49
# 19   Wave 20 - October 2021   5.55
# 20  Wave 21 - November 2021   5.43
# 21  Wave 22 - December 2021   5.35
# 22   Wave 23 - January 2022   5.46
# 23  Wave 24 - February 2022   5.41
# 24     Wave 25 - March 2022   5.30
# 25     Wave 26 - April 2022   5.38
# 26       Wave 27 - May 2022   5.39
# 27      Wave 28 - June 2022   5.55
# 28      Wave 29 - July 2022   5.51
# 29    Wave 30 - August 2022   5.52
# 30 Wave 31 - September 2022   5.54

huangapple
  • 本文由 发表于 2023年5月24日 17:50:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76322208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定