根据页面范围创建新的因子列

huangapple go评论97阅读模式
英文:

Creating new factor column based on page range

问题

我正在寻找一种更智能的方法来在R数据框df中创建一个新的因子列。

我有一个数据框,我想要在其中添加一个新列,告诉我给定记录属于哪个部分。部分如下:

section_in_text <- factor(c('Introduction', 'Characters', 'Footnotes', 'Bibliography'))

给定记录属于哪个部分是由列df$page定义的。

到目前为止,我已经使用了一个如下所示的函数来实现这一点:

  1. document_sections <- function(x) {
  2. if (x<5) {
  3. return("Introduction")
  4. }
  5. else if ((5<=x) & (x<23)) {
  6. return("Characters")
  7. }...}

然后我使用了sapply()

df$section <- sapply(df$page, document_sections)

也许有一种更智能的方法来实现相同的结果吗?

谢谢。

英文:

I am looking for a smarter way to create a new factor column in an R data frame df.
I have a dataframe, to which I would like to add a new column, which tells me, which section the given record belongs to. Sections like this:

section_in_text &lt;- factor(c(&#39;Introduction&#39;, &#39;Characters&#39;, &#39;Footnotes&#39;, &#39;Bibliography&#39;))

To which section a given record belongs is defined by the column df$page.

As of now, I have achieved this with a function, which looks like this:

  1. document_sections &lt;- function(x) {
  2. if (x&lt;5) {
  3. return(&quot;Introduction&quot;)
  4. }
  5. else if ((5&lt;=x) &amp; (x&lt;23)) {
  6. return(&quot;Characters&quot;)
  7. }...}

Then I have used sapply()
df$section &lt;- sapply(df$page, document_sections)

Is there maybe a smarter way to achieve the same result ?

Thanks.

答案1

得分: 2

使用 cut() 函数:

  1. df <- data.frame(page = seq(1, 40, by = 2))
  2. df$section <- cut(
  3. df$page,
  4. breaks = c(-Inf, 5, 23, 30, Inf),
  5. labels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography'),
  6. right = FALSE
  7. )

或者使用 dplyr::case_when() 函数:

  1. library(dplyr)
  2. df %>%
  3. mutate(section = factor(
  4. case_when(
  5. page < 5 ~ 'Introduction',
  6. page < 23 ~ 'Characters',
  7. page < 30 ~ 'Footnotes',
  8. !is.na(page) ~ 'Bibliography'
  9. ),
  10. levels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography')
  11. ))

两种方法都会得到相同的结果:

  1. page section
  2. 1 1 Introduction
  3. 2 3 Introduction
  4. 3 5 Characters
  5. 4 7 Characters
  6. 5 9 Characters
  7. 6 11 Characters
  8. 7 13 Characters
  9. 8 15 Characters
  10. 9 17 Characters
  11. 10 19 Characters
  12. 11 21 Characters
  13. 12 23 Footnotes
  14. 13 25 Footnotes
  15. 14 27 Footnotes
  16. 15 29 Footnotes
  17. 16 31 Bibliography
  18. 17 33 Bibliography
  19. 18 35 Bibliography
  20. 19 37 Bibliography
  21. 20 39 Bibliography
英文:

Using cut():

  1. df &lt;- data.frame(page = seq(1, 40, by = 2))
  2. df$section &lt;- cut(
  3. df$page,
  4. breaks = c(-Inf, 5, 23, 30, Inf),
  5. labels = c(&#39;Introduction&#39;, &#39;Characters&#39;, &#39;Footnotes&#39;, &#39;Bibliography&#39;),
  6. right = FALSE
  7. )

Or using dplyr::case_when():

  1. library(dplyr)
  2. df %&gt;%
  3. mutate(section = factor(
  4. case_when(
  5. page &lt; 5 ~ &#39;Introduction&#39;,
  6. page &lt; 23 ~ &#39;Characters&#39;,
  7. page &lt; 30 ~ &#39;Footnotes&#39;,
  8. !is.na(page) ~ &#39;Bibliography&#39;
  9. ),
  10. levels = c(&#39;Introduction&#39;, &#39;Characters&#39;, &#39;Footnotes&#39;, &#39;Bibliography&#39;)
  11. ))

Result from either approach:

  1. page section
  2. 1 1 Introduction
  3. 2 3 Introduction
  4. 3 5 Characters
  5. 4 7 Characters
  6. 5 9 Characters
  7. 6 11 Characters
  8. 7 13 Characters
  9. 8 15 Characters
  10. 9 17 Characters
  11. 10 19 Characters
  12. 11 21 Characters
  13. 12 23 Footnotes
  14. 13 25 Footnotes
  15. 14 27 Footnotes
  16. 15 29 Footnotes
  17. 16 31 Bibliography
  18. 17 33 Bibliography
  19. 18 35 Bibliography
  20. 19 37 Bibliography
  21. 20 39 Bibliography

huangapple
  • 本文由 发表于 2023年3月3日 21:57:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627995.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定