重新定义因子水平和组内顺序。

huangapple go评论95阅读模式
英文:

Redefine factor levels and order within groups

问题

以下是您要翻译的内容:

"This is the simple example for illustration.
I want a summary of data presented in a predetermined order. I want to order col2 values depending on col1 values, and also include rows for factor levels within the col1 group that are not in the data (eg using group_by ( ..., .drop=FALSE). Some values in col2 appear in more than col1 group. There is no logic that can be applied to determine the order of col2. You may call it a two-level factor maybe?

For example, my input data could be:

  1. df <- read.table(
  2. header = TRUE,
  3. sep=",",
  4. text = "
  5. col1,col2
  6. Tunnels,Dick
  7. Tunnels,Tom
  8. Tunnels,Tom
  9. Beatles,George
  10. Beatles,Paul
  11. Beatles,Ringo
  12. Beatles,Ringo
  13. UK Artists,Gilbert
  14. "
  15. )

and my required output would be

  1. col1 col2 n
  2. Beatles John 0
  3. Beatles Paul 1
  4. Beatles George 1
  5. Beatles Ringo 2
  6. UK Artists Gilbert 1
  7. UK Artists George 0
  8. Tunnels Tom 2
  9. Tunnels Dick 1
  10. Tunnels Harry 0

The following, of course, does not work

  1. col2_tunnels <- c("Tom", "Dick", "Harry")
  2. col2_beatles <- c("John", "Paul", "George", "Ringo")
  3. col2_artists <- c("Gilbert", "George")
  4. col2_order <- unique(c(col2_tunnels, col2_beatles, col2_artists)) # cannot have duplicates
  5. col1_order <- c("Beatles", "UK Artists", "Tunnels")
  6. df %>% mutate(
  7. col1 = factor(col1, levels = col1_order),
  8. col2 = factor(col2, levels = col2_order)
  9. ) %>% group_by(col1, col2, .drop = FALSE) %>% summarise(n = n(), )

The only way forward I can see is to split the data by col1 levels and use a named list of vectors defining the factor order for each level of col1. While writing the question I found this worked

  1. col2_fctlist <- list(
  2. Tunnels = c("Tom", "Dick", "Harry"),
  3. Beatles = c("John", "Paul", "George", "Ringo"),
  4. 'UK Artists' = c("Gilbert", "George")
  5. )
  6. x <- lapply(col1_order, function(col1grp)
  7. df %>% filter(col1==col1grp) %>%
  8. mutate(col2 = factor(col2, levels = col2_fctlist[[col1grp]])) %>%
  9. group_by(col1, col2, .drop = FALSE) %>%
  10. summarise(n = n(), )
  11. )
  12. do.call(rbind, x)

虽然我已经找到了一个我认为适合我的解决方案,但我仍然发布在这里,以防有人能够提供更好的解决方案?"

英文:

This is the simple example for illustration.
I want a summary of data presented in a predetermined order. I want to order col2 values depending on col1 values, and also include rows for factor levels within the col1 group that are not in the data (eg using group_by ( ..., .drop=FALSE). Some values in col2 appear in more than col1 group. There is no logic that can be applied to determine the order of col2. You may call it a two-level factor maybe?

For example , my input data could be:

  1. df <- read.table(
  2. header = TRUE,
  3. sep=",",
  4. text = "
  5. col1,col2
  6. Tunnels,Dick
  7. Tunnels,Tom
  8. Tunnels,Tom
  9. Beatles,George
  10. Beatles,Paul
  11. Beatles,Ringo
  12. Beatles,Ringo
  13. UK Artists,Gilbert
  14. "
  15. )

and my required output would be

  1. col1 col2 n
  2. Beatles John 0
  3. Beatles Paul 1
  4. Beatles George 1
  5. Beatles Ringo 2
  6. UK Artists Gilbert 1
  7. UK Artists George 0
  8. Tunnels Tom 2
  9. Tunnels Dick 1
  10. Tunnels Harry 0

The following , of course, does not work

  1. col2_tunnels <- c("Tom", "Dick", "Harry")
  2. col2_beatles <- c("John", "Paul", "George", "Ringo")
  3. col2_artists <- c("Gilbert", "George")
  4. col2_order <- unique(c(col2_tunnels, col2_beatles, col2_artists)) # cannot have duplicates
  5. col1_order <- c("Beatles", "UK Artists", "Tunnels")
  6. df %>%
  7. mutate(
  8. col1 = factor(col1, levels = col1_order),
  9. col2 = factor(col2, levels = col2_order)
  10. ) %>%
  11. group_by(col1, col2, .drop = FALSE) %>%
  12. summarise(n = n(), )

The only way forward I can see is to split the data by col1 levels and use a named list of vectors defining the factor order for each level of col1. While writing the question I found this worked

  1. col2_fctlist <- list(
  2. Tunnels = c("Tom", "Dick", "Harry"),
  3. Beatles = c("John", "Paul", "George", "Ringo"),
  4. 'UK Artists' = c("Gilbert", "George")
  5. )
  6. x <- lapply(col1_order, function(col1grp)
  7. df %>% filter(col1==col1grp) %>%
  8. mutate(col2 = factor(col2, levels = col2_fctlist[[col1grp]])) %>%
  9. group_by(col1, col2, .drop = FALSE) %>%
  10. summarise(n = n(), )
  11. )
  12. do.call(rbind, x)

Although I have found a solution that I think works for me, I'm still posting in case anybody can offer a better solution?

答案1

得分: 2

不知道这是否比你的更好!使用 data.table,如果我首先按照如下方式设置col1col2的所需顺序:

  1. l1 <- list(Beatles=data.frame(col2=c("John", "Paul", "George", "Ringo")),
  2. `UK Artists`=data.frame(col2=c("Gilbert", "George")),
  3. `Tunnels`=data.frame(col2=c("Tom", "Dick", "Harry"))

然后,我可以使用 rbindlist 将其转换为一个 data.table,并使用 df 进行连接,以按指定顺序获取所需的输出:

  1. dt1 <- rbindlist(l1, idcol = "col1")
  2. df[,n:=1][ dt1 , on=c("col1","col2")][, sum(n,na.rm = TRUE) , .(col1, col2)]
  3. col1 col2 V1
  4. 1: Beatles John 0
  5. 2: Beatles Paul 1
  6. 3: Beatles George 1
  7. 4: Beatles Ringo 2
  8. 5: UK Artists Gilbert 1
  9. 6: UK Artists George 0
  10. 7: Tunnels Tom 2
  11. 8: Tunnels Dick 1
  12. 9: Tunnels Harry 0
英文:

I don't know if this is better than yours! Using data.table, if I first set up the required order for col1 and col2 in a list like this:

  1. l1 &lt;- list(Beatles=data.frame(col2=c(&quot;John&quot;, &quot;Paul&quot;, &quot;George&quot;, &quot;Ringo&quot;)),
  2. `UK Artists`=data.frame(col2=c(&quot;Gilbert&quot;, &quot;George&quot;)),
  3. `Tunnels`=data.frame(col2=c(&quot;Tom&quot;, &quot;Dick&quot;, &quot;Harry&quot;))

Then I can turn this into a data.table using rblindlist and use a join with df to get the output that you want in the specified order:

  1. dt1 &lt;- rbindlist(l1, idcol = &quot;col1&quot;)
  2. df[,n:=1][ dt1 , on=c(&quot;col1&quot;,&quot;col2&quot;)][, sum(n,na.rm = TRUE) , .(col1, col2)]
  3. col1 col2 V1
  4. 1: Beatles John 0
  5. 2: Beatles Paul 1
  6. 3: Beatles George 1
  7. 4: Beatles Ringo 2
  8. 5: UK Artists Gilbert 1
  9. 6: UK Artists George 0
  10. 7: Tunnels Tom 2
  11. 8: Tunnels Dick 1
  12. 9: Tunnels Harry 0

答案2

得分: 1

With a join:

  1. 使用 `join` 函数:
  2. ```r
  3. library(tidyverse)
  4. enframe(col2_fctlist, name = "col1", value = "col2") %>% unnest(col2) %>%
  5. left_join(df %>% count(col1, col2)) %>%
  6. replace_na(list(n = 0))
  1. col1 col2 n

1 Tunnels Tom 2
2 Tunnels Dick 1
3 Tunnels Harry 0
4 Beatles John 0
5 Beatles Paul 1
6 Beatles George 1
7 Beatles Ringo 2
8 UK Artists Gilbert 1
9 UK Artists George 0

  1. Or with `imap_dfr`:
  2. ```r
  3. 使用 `imap_dfr` 函数:
  4. ```r
  5. imap_dfr(col2_fctlist,
  6. ~ df %>%
  7. filter(col1 == .y) %>%
  8. mutate(col2 = factor(col2, levels = .x)) %>%
  9. count(col2, .drop = FALSE),
  10. .id = "col1")
英文:

With a join:

  1. library(tidyverse)
  2. enframe(col2_fctlist, name = &quot;col1&quot;, value = &quot;col2&quot;) %&gt;% unnest(col2) %&gt;%
  3. left_join(df %&gt;% count(col1, col2)) %&gt;%
  4. replace_na(list(n = 0))
  5. col1 col2 n
  6. 1 Tunnels Tom 2
  7. 2 Tunnels Dick 1
  8. 3 Tunnels Harry 0
  9. 4 Beatles John 0
  10. 5 Beatles Paul 1
  11. 6 Beatles George 1
  12. 7 Beatles Ringo 2
  13. 8 UK Artists Gilbert 1
  14. 9 UK Artists George 0

Or with imap_dfr:

  1. imap_dfr(col2_fctlist,
  2. ~ df %&gt;%
  3. filter(col1 == .y) %&gt;%
  4. mutate(col2 = factor(col2, levels = .x)) %&gt;%
  5. count(col2, .drop = FALSE),
  6. .id = &quot;col1&quot;)

huangapple
  • 本文由 发表于 2023年3月3日 19:34:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75626575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定