英文:
How can I order a data frame so it increases numerically?
问题
我有下面的表格。我想按数字升序排序,从第2波到第31波。
波次 平均_B
<chr> <dbl>
1 第10波 - 2020年12月 5.49
2 第11波 - 2021年1月 5.52
3 第12波 - 2021年2月 5.52
4 第13波 - 2021年3月 5.45
5 第14波 - 2021年4月 5.53
6 第15波 - 2021年5月 5.46
7 第16波 - 2021年6月 5.51
8 第17波 - 2021年7月 5.63
9 第18波 - 2021年8月 5.54
10 第19波 - 2021年9月 5.49
11 第2波 - 2020年4月 5.67
12 第20波 - 2021年10月 5.55
13 第21波 - 2021年11月 5.43
14 第22波 - 2021年12月 5.35
15 第23波 - 2022年1月 5.46
16 第24波 - 2022年2月 5.41
17 第25波 - 2022年3月 5.30
18 第26波 - 2022年4月 5.38
19 第27波 - 2022年5月 5.39
20 第28波 - 2022年6月 5.55
21 第29波 - 2022年7月 5.51
22 第3波 - 2020年5月 5.72
23 第30波 - 2022年8月 5.52
24 第31波 - 2022年9月 5.54
25 第4波 - 2020年6月 5.62
26 第5波 - 2020年7月 5.54
27 第6波 - 2020年8月 5.61
28 第7波 - 2020年9月 5.60
29 第8波 - 2020年10月 5.54
30 第9波 - 2020年11月 5.60
我创建了这个表格,希望能自动按照以下代码中的数字顺序排序:
mean_values <- Trial %>%
group_by(Wave) %>%
summarise(mean_B = mean(Weight_Answer, na.rm = TRUE))
然后我尝试了下面的代码:
mean_values <- Trial %>%
group_by(Wave) %>%
summarise(mean_B = mean(Weight_Answer, na.rm = TRUE)) %>%
arrange(as.numeric(gsub("Wave ", "", Wave)))
但它出现了警告:
>警告信息:
arrange()
中有1个警告。
ℹ 参数:..1 = as.numeric(gsub("Wave ", "", Wave))
。
引起警告的原因:
! 强制转换引入了NAs。
英文:
I have the table below. I want to order it such that it goes from Wave 2 - Wave 31 in numeric ascending order.
Wave mean_B
<chr> <dbl>
1 Wave 10 - December 2020 5.49
2 Wave 11 - January 2021 5.52
3 Wave 12 - February 2021 5.52
4 Wave 13 - March 2021 5.45
5 Wave 14 - April 2021 5.53
6 Wave 15 - May 2021 5.46
7 Wave 16 - June 2021 5.51
8 Wave 17 - July 2021 5.63
9 Wave 18 - August 2021 5.54
10 Wave 19 - September 2021 5.49
11 Wave 2 - April 2020 5.67
12 Wave 20 - October 2021 5.55
13 Wave 21 - November 2021 5.43
14 Wave 22 - December 2021 5.35
15 Wave 23 - January 2022 5.46
16 Wave 24 - February 2022 5.41
17 Wave 25 - March 2022 5.30
18 Wave 26 - April 2022 5.38
19 Wave 27 - May 2022 5.39
20 Wave 28 - June 2022 5.55
21 Wave 29 - July 2022 5.51
22 Wave 3 - May 2020 5.72
23 Wave 30 - August 2022 5.52
24 Wave 31 - September 2022 5.54
25 Wave 4 - June 2020 5.62
26 Wave 5 - July 2020 5.54
27 Wave 6 - August 2020 5.61
28 Wave 7 - September 2020 5.60
29 Wave 8 - October 2020 5.54
30 Wave 9 - November 2020 5.60
I created the table expecting it to be ordered numerically automatically using the following code:
mean_values <- Trial %>%
group_by(Wave) %>%
summarise(mean_B = mean(Weight_Answer, na.rm = TRUE))
I then tried to the code:
mean_values <- Trial %>%
group_by(Wave) %>%
summarise(mean_B = mean(Weight_Answer, na.rm = TRUE)) %>%
arrange(as.numeric(gsub("Wave ", "", Wave)))
However, it gave the warning:
>Warning message:
There was 1 warning in arrange()
.
ℹ In argument: ..1 = as.numeric(gsub("Wave ", "", Wave))
.
Caused by warning:
! NAs introduced by coercion
I am quite new to R so not sure what this means and how to resolve it
答案1
得分: 1
你可以使用gtools
包中的mixedsort
函数。
library(gtools)
df[match(mixedsort(df$Wave), df$Wave),]
或者提取Wave
数字并根据它进行排序。
library(dplyr)
df %>%
arrange(as.integer(sub("Wave (\\d+).*", "\", Wave)))
输出
Wave mean_B
11 Wave 2 - April 2020 5.67
22 Wave 3 - May 2020 5.72
25 Wave 4 - June 2020 5.62
26 Wave 5 - July 2020 5.54
27 Wave 6 - August 2020 5.61
28 Wave 7 - September 2020 5.60
29 Wave 8 - October 2020 5.54
30 Wave 9 - November 2020 5.60
1 Wave 10 - December 2020 5.49
2 Wave 11 - January 2021 5.52
3 Wave 12 - February 2021 5.52
4 Wave 13 - March 2021 5.45
5 Wave 14 - April 2021 5.53
6 Wave 15 - May 2021 5.46
7 Wave 16 - June 2021 5.51
8 Wave 17 - July 2021 5.63
9 Wave 18 - August 2021 5.54
10 Wave 19 - September 2021 5.49
12 Wave 20 - October 2021 5.55
13 Wave 21 - November 2021 5.43
14 Wave 22 - December 2021 5.35
15 Wave 23 - January 2022 5.46
16 Wave 24 - February 2022 5.41
17 Wave 25 - March 2022 5.30
18 Wave 26 - April 2022 5.38
19 Wave 27 - May 2022 5.39
20 Wave 28 - June 2022 5.55
21 Wave 29 - July 2022 5.51
23 Wave 30 - August 2022 5.52
24 Wave 31 - September 2022 5.54
数据
df <- structure(list(Wave = c("Wave 10 - December 2020", "Wave 11 - January 2021",
"Wave 12 - February 2021", "Wave 13 - March 2021", "Wave 14 - April 2021",
"Wave 15 - May 2021", "Wave 16 - June 2021", "Wave 17 - July 2021",
"Wave 18 - August 2021", "Wave 19 - September 2021", "Wave 2 - April 2020",
"Wave 20 - October 2021", "Wave 21 - November 2021", "Wave 22 - December 2021",
"Wave 23 - January 2022", "Wave 24 - February 2022", "Wave 25 - March 2022",
"Wave 26 - April 2022", "Wave 27 - May 2022", "Wave 28 - June 2022",
"Wave 29 - July 2022", "Wave 3 - May 2020", "Wave 30 - August 2022",
"Wave 31 - September 2022", "Wave 4 - June 2020", "Wave 5 - July 2020",
"Wave 6 - August 2020", "Wave 7 - September 2020", "Wave 8 - October 2020",
"Wave 9 - November 2020"), mean_B = c(5.49, 5.52, 5.52, 5.45,
5.53, 5.46, 5.51, 5.63, 5.54, 5.49, 5.67, 5.55, 5.43, 5.35, 5.46,
5.41, 5.3, 5.38, 5.39, 5.55, 5.51, 5.72, 5.52, 5.54, 5.62, 5.54,
5.61, 5.6, 5.54, 5.6)), class = "data.frame", row.names = c(NA,
-30L))
英文:
You can use mixedsort
from the gtools
package.
library(gtools)
df[match(mixedsort(df$Wave), df$Wave),]
Or to extract the Wave
number and arrange
on that.
library(dplyr)
df |>
arrange(as.integer(sub("Wave (\\d+).*", "\", Wave)))
Output
Wave mean_B
11 Wave 2 - April 2020 5.67
22 Wave 3 - May 2020 5.72
25 Wave 4 - June 2020 5.62
26 Wave 5 - July 2020 5.54
27 Wave 6 - August 2020 5.61
28 Wave 7 - September 2020 5.60
29 Wave 8 - October 2020 5.54
30 Wave 9 - November 2020 5.60
1 Wave 10 - December 2020 5.49
2 Wave 11 - January 2021 5.52
3 Wave 12 - February 2021 5.52
4 Wave 13 - March 2021 5.45
5 Wave 14 - April 2021 5.53
6 Wave 15 - May 2021 5.46
7 Wave 16 - June 2021 5.51
8 Wave 17 - July 2021 5.63
9 Wave 18 - August 2021 5.54
10 Wave 19 - September 2021 5.49
12 Wave 20 - October 2021 5.55
13 Wave 21 - November 2021 5.43
14 Wave 22 - December 2021 5.35
15 Wave 23 - January 2022 5.46
16 Wave 24 - February 2022 5.41
17 Wave 25 - March 2022 5.30
18 Wave 26 - April 2022 5.38
19 Wave 27 - May 2022 5.39
20 Wave 28 - June 2022 5.55
21 Wave 29 - July 2022 5.51
23 Wave 30 - August 2022 5.52
24 Wave 31 - September 2022 5.54
Data
df <- structure(list(Wave = c("Wave 10 - December 2020", "Wave 11 - January 2021",
"Wave 12 - February 2021", "Wave 13 - March 2021", "Wave 14 - April 2021",
"Wave 15 - May 2021", "Wave 16 - June 2021", "Wave 17 - July 2021",
"Wave 18 - August 2021", "Wave 19 - September 2021", "Wave 2 - April 2020",
"Wave 20 - October 2021", "Wave 21 - November 2021", "Wave 22 - December 2021",
"Wave 23 - January 2022", "Wave 24 - February 2022", "Wave 25 - March 2022",
"Wave 26 - April 2022", "Wave 27 - May 2022", "Wave 28 - June 2022",
"Wave 29 - July 2022", "Wave 3 - May 2020", "Wave 30 - August 2022",
"Wave 31 - September 2022", "Wave 4 - June 2020", "Wave 5 - July 2020",
"Wave 6 - August 2020", "Wave 7 - September 2020", "Wave 8 - October 2020",
"Wave 9 - November 2020"), mean_B = c(5.49, 5.52, 5.52, 5.45,
5.53, 5.46, 5.51, 5.63, 5.54, 5.49, 5.67, 5.55, 5.43, 5.35, 5.46,
5.41, 5.3, 5.38, 5.39, 5.55, 5.51, 5.72, 5.52, 5.54, 5.62, 5.54,
5.61, 5.6, 5.54, 5.6)), class = "data.frame", row.names = c(NA,
-30L))
答案2
得分: 1
@benson23 给出了一个不错的答案。以下是一个 "base R" 的答案。
# 提取波数
waveNumbers <- sub("Wave (\\d+) - \\w+ \\d+", "\", dat$Wave) |> as.numeric()
# 进行相应的排列
dat2 <- dat[order(waveNumbers), ]
请注意,行的 "names"(1, 2, ...)也被排列了。你可以使用 rownames(dat2) <- NULL
来避免这种情况。
英文:
@benson23 gave a nice answer. Here is a "base R" answer.
# extract wave numbers
waveNumbers <- sub("Wave (\\d+) \\- \\w+ \\d+", "\", dat$Wave) |> as.numeric()
# permute accordingly
dat2 <- dat[order(waveNumbers), ]
Note that the row "names" (1, 2, ...) are permuted too. You can do rownames(dat2) <- NULL
to avoid that.
答案3
得分: 0
在你已经使用tidyverse的情况下,我们可以使用readr::parse_number
:
readr::parse_number("Wave 9 - November 2020 5.60")
# [1] 9
这很容易添加到你的dplyr管道中:
library(dplyr)
df %>%
arrange(readr::parse_number(Wave))
# Wave mean_B
# 1 Wave 2 - April 2020 5.67
# 2 Wave 3 - May 2020 5.72
# 3 Wave 4 - June 2020 5.62
# 4 Wave 5 - July 2020 5.54
# 5 Wave 6 - August 2020 5.61
# 6 Wave 7 - September 2020 5.60
# 7 Wave 8 - October 2020 5.54
# 8 Wave 9 - November 2020 5.60
# 9 Wave 10 - December 2020 5.49
# 10 Wave 11 - January 2021 5.52
# 11 Wave 12 - February 2021 5.52
# 12 Wave 13 - March 2021 5.45
# 13 Wave 14 - April 2021 5.53
# 14 Wave 15 - May 2021 5.46
# 15 Wave 16 - June 2021 5.51
# 16 Wave 17 - July 2021 5.63
# 17 Wave 18 - August 2021 5.54
# 18 Wave 19 - September 2021 5.49
# 19 Wave 20 - October 2021 5.55
# 20 Wave 21 - November 2021 5.43
# 21 Wave 22 - December 2021 5.35
# 22 Wave 23 - January 2022 5.46
# 23 Wave 24 - February 2022 5.41
# 24 Wave 25 - March 2022 5.30
# 25 Wave 26 - April 2022 5.38
# 26 Wave 27 - May 2022 5.39
# 27 Wave 28 - June 2022 5.55
# 28 Wave 29 - July 2022 5.51
# 29 Wave 30 - August 2022 5.52
# 30 Wave 31 - September 2022 5.54
英文:
Since you're already in the tidyverse, we can use readr::parse_number
:
readr::parse_number("Wave 9 - November 2020 5.60")
# [1] 9
This is easy enough to add to your dplyr pipe:
library(dplyr)
df %>%
arrange(readr::parse_number(Wave))
# Wave mean_B
# 1 Wave 2 - April 2020 5.67
# 2 Wave 3 - May 2020 5.72
# 3 Wave 4 - June 2020 5.62
# 4 Wave 5 - July 2020 5.54
# 5 Wave 6 - August 2020 5.61
# 6 Wave 7 - September 2020 5.60
# 7 Wave 8 - October 2020 5.54
# 8 Wave 9 - November 2020 5.60
# 9 Wave 10 - December 2020 5.49
# 10 Wave 11 - January 2021 5.52
# 11 Wave 12 - February 2021 5.52
# 12 Wave 13 - March 2021 5.45
# 13 Wave 14 - April 2021 5.53
# 14 Wave 15 - May 2021 5.46
# 15 Wave 16 - June 2021 5.51
# 16 Wave 17 - July 2021 5.63
# 17 Wave 18 - August 2021 5.54
# 18 Wave 19 - September 2021 5.49
# 19 Wave 20 - October 2021 5.55
# 20 Wave 21 - November 2021 5.43
# 21 Wave 22 - December 2021 5.35
# 22 Wave 23 - January 2022 5.46
# 23 Wave 24 - February 2022 5.41
# 24 Wave 25 - March 2022 5.30
# 25 Wave 26 - April 2022 5.38
# 26 Wave 27 - May 2022 5.39
# 27 Wave 28 - June 2022 5.55
# 28 Wave 29 - July 2022 5.51
# 29 Wave 30 - August 2022 5.52
# 30 Wave 31 - September 2022 5.54
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论