英文:
Redefine factor levels and order within groups
问题
以下是您要翻译的内容:
"This is the simple example for illustration.
I want a summary of data presented in a predetermined order. I want to order col2 values depending on col1 values, and also include rows for factor levels within the col1 group that are not in the data (eg using group_by ( ..., .drop=FALSE). Some values in col2 appear in more than col1 group. There is no logic that can be applied to determine the order of col2. You may call it a two-level factor maybe?
For example, my input data could be:
df <- read.table(
header = TRUE,
sep=",",
text = "
col1,col2
Tunnels,Dick
Tunnels,Tom
Tunnels,Tom
Beatles,George
Beatles,Paul
Beatles,Ringo
Beatles,Ringo
UK Artists,Gilbert
"
)
and my required output would be
col1 col2 n
Beatles John 0
Beatles Paul 1
Beatles George 1
Beatles Ringo 2
UK Artists Gilbert 1
UK Artists George 0
Tunnels Tom 2
Tunnels Dick 1
Tunnels Harry 0
The following, of course, does not work
col2_tunnels <- c("Tom", "Dick", "Harry")
col2_beatles <- c("John", "Paul", "George", "Ringo")
col2_artists <- c("Gilbert", "George")
col2_order <- unique(c(col2_tunnels, col2_beatles, col2_artists)) # cannot have duplicates
col1_order <- c("Beatles", "UK Artists", "Tunnels")
df %>% mutate(
col1 = factor(col1, levels = col1_order),
col2 = factor(col2, levels = col2_order)
) %>% group_by(col1, col2, .drop = FALSE) %>% summarise(n = n(), )
The only way forward I can see is to split the data by col1 levels and use a named list of vectors defining the factor order for each level of col1. While writing the question I found this worked
col2_fctlist <- list(
Tunnels = c("Tom", "Dick", "Harry"),
Beatles = c("John", "Paul", "George", "Ringo"),
'UK Artists' = c("Gilbert", "George")
)
x <- lapply(col1_order, function(col1grp)
df %>% filter(col1==col1grp) %>%
mutate(col2 = factor(col2, levels = col2_fctlist[[col1grp]])) %>%
group_by(col1, col2, .drop = FALSE) %>%
summarise(n = n(), )
)
do.call(rbind, x)
虽然我已经找到了一个我认为适合我的解决方案,但我仍然发布在这里,以防有人能够提供更好的解决方案?"
英文:
This is the simple example for illustration.
I want a summary of data presented in a predetermined order. I want to order col2 values depending on col1 values, and also include rows for factor levels within the col1 group that are not in the data (eg using group_by ( ..., .drop=FALSE). Some values in col2 appear in more than col1 group. There is no logic that can be applied to determine the order of col2. You may call it a two-level factor maybe?
For example , my input data could be:
df <- read.table(
header = TRUE,
sep=",",
text = "
col1,col2
Tunnels,Dick
Tunnels,Tom
Tunnels,Tom
Beatles,George
Beatles,Paul
Beatles,Ringo
Beatles,Ringo
UK Artists,Gilbert
"
)
and my required output would be
col1 col2 n
Beatles John 0
Beatles Paul 1
Beatles George 1
Beatles Ringo 2
UK Artists Gilbert 1
UK Artists George 0
Tunnels Tom 2
Tunnels Dick 1
Tunnels Harry 0
The following , of course, does not work
col2_tunnels <- c("Tom", "Dick", "Harry")
col2_beatles <- c("John", "Paul", "George", "Ringo")
col2_artists <- c("Gilbert", "George")
col2_order <- unique(c(col2_tunnels, col2_beatles, col2_artists)) # cannot have duplicates
col1_order <- c("Beatles", "UK Artists", "Tunnels")
df %>%
mutate(
col1 = factor(col1, levels = col1_order),
col2 = factor(col2, levels = col2_order)
) %>%
group_by(col1, col2, .drop = FALSE) %>%
summarise(n = n(), )
The only way forward I can see is to split the data by col1 levels and use a named list of vectors defining the factor order for each level of col1. While writing the question I found this worked
col2_fctlist <- list(
Tunnels = c("Tom", "Dick", "Harry"),
Beatles = c("John", "Paul", "George", "Ringo"),
'UK Artists' = c("Gilbert", "George")
)
x <- lapply(col1_order, function(col1grp)
df %>% filter(col1==col1grp) %>%
mutate(col2 = factor(col2, levels = col2_fctlist[[col1grp]])) %>%
group_by(col1, col2, .drop = FALSE) %>%
summarise(n = n(), )
)
do.call(rbind, x)
Although I have found a solution that I think works for me, I'm still posting in case anybody can offer a better solution?
答案1
得分: 2
不知道这是否比你的更好!使用 data.table
,如果我首先按照如下方式设置col1
和col2
的所需顺序:
l1 <- list(Beatles=data.frame(col2=c("John", "Paul", "George", "Ringo")),
`UK Artists`=data.frame(col2=c("Gilbert", "George")),
`Tunnels`=data.frame(col2=c("Tom", "Dick", "Harry"))
然后,我可以使用 rbindlist
将其转换为一个 data.table
,并使用 df
进行连接,以按指定顺序获取所需的输出:
dt1 <- rbindlist(l1, idcol = "col1")
df[,n:=1][ dt1 , on=c("col1","col2")][, sum(n,na.rm = TRUE) , .(col1, col2)]
col1 col2 V1
1: Beatles John 0
2: Beatles Paul 1
3: Beatles George 1
4: Beatles Ringo 2
5: UK Artists Gilbert 1
6: UK Artists George 0
7: Tunnels Tom 2
8: Tunnels Dick 1
9: Tunnels Harry 0
英文:
I don't know if this is better than yours! Using data.table
, if I first set up the required order for col1
and col2
in a list like this:
l1 <- list(Beatles=data.frame(col2=c("John", "Paul", "George", "Ringo")),
`UK Artists`=data.frame(col2=c("Gilbert", "George")),
`Tunnels`=data.frame(col2=c("Tom", "Dick", "Harry"))
Then I can turn this into a data.table
using rblindlist
and use a join with df
to get the output that you want in the specified order:
dt1 <- rbindlist(l1, idcol = "col1")
df[,n:=1][ dt1 , on=c("col1","col2")][, sum(n,na.rm = TRUE) , .(col1, col2)]
col1 col2 V1
1: Beatles John 0
2: Beatles Paul 1
3: Beatles George 1
4: Beatles Ringo 2
5: UK Artists Gilbert 1
6: UK Artists George 0
7: Tunnels Tom 2
8: Tunnels Dick 1
9: Tunnels Harry 0
答案2
得分: 1
With a join
:
使用 `join` 函数:
```r
library(tidyverse)
enframe(col2_fctlist, name = "col1", value = "col2") %>% unnest(col2) %>%
left_join(df %>% count(col1, col2)) %>%
replace_na(list(n = 0))
col1 col2 n
1 Tunnels Tom 2
2 Tunnels Dick 1
3 Tunnels Harry 0
4 Beatles John 0
5 Beatles Paul 1
6 Beatles George 1
7 Beatles Ringo 2
8 UK Artists Gilbert 1
9 UK Artists George 0
Or with `imap_dfr`:
```r
使用 `imap_dfr` 函数:
```r
imap_dfr(col2_fctlist,
~ df %>%
filter(col1 == .y) %>%
mutate(col2 = factor(col2, levels = .x)) %>%
count(col2, .drop = FALSE),
.id = "col1")
英文:
With a join
:
library(tidyverse)
enframe(col2_fctlist, name = "col1", value = "col2") %>% unnest(col2) %>%
left_join(df %>% count(col1, col2)) %>%
replace_na(list(n = 0))
col1 col2 n
1 Tunnels Tom 2
2 Tunnels Dick 1
3 Tunnels Harry 0
4 Beatles John 0
5 Beatles Paul 1
6 Beatles George 1
7 Beatles Ringo 2
8 UK Artists Gilbert 1
9 UK Artists George 0
Or with imap_dfr
:
imap_dfr(col2_fctlist,
~ df %>%
filter(col1 == .y) %>%
mutate(col2 = factor(col2, levels = .x)) %>%
count(col2, .drop = FALSE),
.id = "col1")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论