根据页面范围创建新的因子列

huangapple go评论75阅读模式
英文:

Creating new factor column based on page range

问题

我正在寻找一种更智能的方法来在R数据框df中创建一个新的因子列。

我有一个数据框,我想要在其中添加一个新列,告诉我给定记录属于哪个部分。部分如下:

section_in_text <- factor(c('Introduction', 'Characters', 'Footnotes', 'Bibliography'))

给定记录属于哪个部分是由列df$page定义的。

到目前为止,我已经使用了一个如下所示的函数来实现这一点:

document_sections <- function(x) {
if (x<5) {
return("Introduction")
}
else if ((5<=x) & (x<23)) {
return("Characters")
}...}

然后我使用了sapply()

df$section <- sapply(df$page, document_sections)

也许有一种更智能的方法来实现相同的结果吗?

谢谢。

英文:

I am looking for a smarter way to create a new factor column in an R data frame df.
I have a dataframe, to which I would like to add a new column, which tells me, which section the given record belongs to. Sections like this:

section_in_text &lt;- factor(c(&#39;Introduction&#39;, &#39;Characters&#39;, &#39;Footnotes&#39;, &#39;Bibliography&#39;))

To which section a given record belongs is defined by the column df$page.

As of now, I have achieved this with a function, which looks like this:

document_sections &lt;- function(x) {
if (x&lt;5) {
return(&quot;Introduction&quot;)
}
else if ((5&lt;=x) &amp; (x&lt;23)) {
return(&quot;Characters&quot;)
}...}

Then I have used sapply()
df$section &lt;- sapply(df$page, document_sections)

Is there maybe a smarter way to achieve the same result ?

Thanks.

答案1

得分: 2

使用 cut() 函数:

df <- data.frame(page = seq(1, 40, by = 2))

df$section <- cut(
  df$page, 
  breaks = c(-Inf, 5, 23, 30, Inf),
  labels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography'),
  right = FALSE
)

或者使用 dplyr::case_when() 函数:

library(dplyr)

df %>%
  mutate(section = factor(
    case_when(
      page < 5 ~ 'Introduction',
      page < 23 ~ 'Characters', 
      page < 30 ~ 'Footnotes', 
      !is.na(page) ~ 'Bibliography'
    ),
    levels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography')
  ))

两种方法都会得到相同的结果:

   page      section
1     1 Introduction
2     3 Introduction
3     5   Characters
4     7   Characters
5     9   Characters
6    11   Characters
7    13   Characters
8    15   Characters
9    17   Characters
10   19   Characters
11   21   Characters
12   23    Footnotes
13   25    Footnotes
14   27    Footnotes
15   29    Footnotes
16   31 Bibliography
17   33 Bibliography
18   35 Bibliography
19   37 Bibliography
20   39 Bibliography
英文:

Using cut():

df &lt;- data.frame(page = seq(1, 40, by = 2))

df$section &lt;- cut(
  df$page, 
  breaks = c(-Inf, 5, 23, 30, Inf),
  labels = c(&#39;Introduction&#39;, &#39;Characters&#39;, &#39;Footnotes&#39;, &#39;Bibliography&#39;),
  right = FALSE
)

Or using dplyr::case_when():

library(dplyr)

df %&gt;%
  mutate(section = factor(
    case_when(
      page &lt; 5 ~ &#39;Introduction&#39;,
      page &lt; 23 ~ &#39;Characters&#39;, 
      page &lt; 30 ~ &#39;Footnotes&#39;, 
      !is.na(page) ~ &#39;Bibliography&#39;
    ),
    levels = c(&#39;Introduction&#39;, &#39;Characters&#39;, &#39;Footnotes&#39;, &#39;Bibliography&#39;)
  ))

Result from either approach:

   page      section
1     1 Introduction
2     3 Introduction
3     5   Characters
4     7   Characters
5     9   Characters
6    11   Characters
7    13   Characters
8    15   Characters
9    17   Characters
10   19   Characters
11   21   Characters
12   23    Footnotes
13   25    Footnotes
14   27    Footnotes
15   29    Footnotes
16   31 Bibliography
17   33 Bibliography
18   35 Bibliography
19   37 Bibliography
20   39 Bibliography

huangapple
  • 本文由 发表于 2023年3月3日 21:57:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627995.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定