英文:
Creating new factor column based on page range
问题
我正在寻找一种更智能的方法来在R数据框df
中创建一个新的因子列。
我有一个数据框,我想要在其中添加一个新列,告诉我给定记录属于哪个部分。部分如下:
section_in_text <- factor(c('Introduction', 'Characters', 'Footnotes', 'Bibliography'))
给定记录属于哪个部分是由列df$page
定义的。
到目前为止,我已经使用了一个如下所示的函数来实现这一点:
document_sections <- function(x) {
if (x<5) {
return("Introduction")
}
else if ((5<=x) & (x<23)) {
return("Characters")
}...}
然后我使用了sapply()
df$section <- sapply(df$page, document_sections)
也许有一种更智能的方法来实现相同的结果吗?
谢谢。
英文:
I am looking for a smarter way to create a new factor column in an R data frame df
.
I have a dataframe, to which I would like to add a new column, which tells me, which section the given record belongs to. Sections like this:
section_in_text <- factor(c('Introduction', 'Characters', 'Footnotes', 'Bibliography'))
To which section a given record belongs is defined by the column df$page
.
As of now, I have achieved this with a function, which looks like this:
document_sections <- function(x) {
if (x<5) {
return("Introduction")
}
else if ((5<=x) & (x<23)) {
return("Characters")
}...}
Then I have used sapply()
df$section <- sapply(df$page, document_sections)
Is there maybe a smarter way to achieve the same result ?
Thanks.
答案1
得分: 2
使用 cut()
函数:
df <- data.frame(page = seq(1, 40, by = 2))
df$section <- cut(
df$page,
breaks = c(-Inf, 5, 23, 30, Inf),
labels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography'),
right = FALSE
)
或者使用 dplyr::case_when()
函数:
library(dplyr)
df %>%
mutate(section = factor(
case_when(
page < 5 ~ 'Introduction',
page < 23 ~ 'Characters',
page < 30 ~ 'Footnotes',
!is.na(page) ~ 'Bibliography'
),
levels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography')
))
两种方法都会得到相同的结果:
page section
1 1 Introduction
2 3 Introduction
3 5 Characters
4 7 Characters
5 9 Characters
6 11 Characters
7 13 Characters
8 15 Characters
9 17 Characters
10 19 Characters
11 21 Characters
12 23 Footnotes
13 25 Footnotes
14 27 Footnotes
15 29 Footnotes
16 31 Bibliography
17 33 Bibliography
18 35 Bibliography
19 37 Bibliography
20 39 Bibliography
英文:
Using cut()
:
df <- data.frame(page = seq(1, 40, by = 2))
df$section <- cut(
df$page,
breaks = c(-Inf, 5, 23, 30, Inf),
labels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography'),
right = FALSE
)
Or using dplyr::case_when()
:
library(dplyr)
df %>%
mutate(section = factor(
case_when(
page < 5 ~ 'Introduction',
page < 23 ~ 'Characters',
page < 30 ~ 'Footnotes',
!is.na(page) ~ 'Bibliography'
),
levels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography')
))
Result from either approach:
page section
1 1 Introduction
2 3 Introduction
3 5 Characters
4 7 Characters
5 9 Characters
6 11 Characters
7 13 Characters
8 15 Characters
9 17 Characters
10 19 Characters
11 21 Characters
12 23 Footnotes
13 25 Footnotes
14 27 Footnotes
15 29 Footnotes
16 31 Bibliography
17 33 Bibliography
18 35 Bibliography
19 37 Bibliography
20 39 Bibliography
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论