英文:
Count pairs occurence and plot them
问题
我有一个名为x的数据集,例如:
client_id | a | b | c |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 1 |
3 | 0 | 0 | 1 |
4 | 1 | 1 | 1 |
然后,我想创建另一个表格,计算表格中每个可能对的出现次数。所以结果会像这样:
pair | count |
---|---|
a | 1 |
b | 0 |
c | 1 |
a,b | 0 |
a,c | 0 |
b,c | 1 |
a,b,c | 1 |
这意味着例如对于第一行,a 单独出现了(等于1),其他列没有出现(都等于0)只有一次。
之后,我想以ggplot最佳方式可视化成对出现的结果。
英文:
I have this dataset called x for example:
client_id | a | b | c |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 1 |
3 | 0 | 0 | 1 |
4 | 1 | 1 | 1 |
I then want to create another table, that calculates the number of occurence of each possible pair in the table. So the result would be something like this:
pair | count |
---|---|
a | 1 |
b | 0 |
c | 1 |
a,b | 0 |
a,c | 0 |
b,c | 1 |
a,b,c | 1 |
That means for example that for the first row, a occured (was equal to 1) alone (means other columns didn't, and so they were at 0) only once.
After this, I want the best way to visualize the result of the pairs in ggplot.
Edit: More explanation of the second table:
a (single letter): a is the strictly one which occured (rest are 0).
a,b (two letters): a and b both occured, but not the rest.
a,b,c (three letters): all a,b,c occured at the same time (all are set to 1).
答案1
得分: 1
这是关于数据透视部分的内容,目前不确定最佳绘图方法。
library(dplyr)
poss <- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=",")))))
poss
# pair
# 1 a
# 2 b
# 3 c
# 4 a,b
# 5 a,c
# 6 b,c
# 7 a,b,c
library(dplyr)
library(tidyr)
quux %>%
pivot_longer(-client_id) %>%
summarize(pair = paste(sort(unique(name[value==1])), collapse = ","), .by = client_id) %>%
count(pair) %>%
full_join(poss, by="pair") %>%
mutate(n = coalesce(n, 0)) %>%
arrange(nchar(pair), pair)
# # A tibble: 7 × 2
# pair n
# <chr> <dbl>
# 1 a 1
# 2 b 0
# 3 c 1
# 4 a,b 0
# 5 a,c 0
# 6 b,c 1
# 7 a,b,c 1
数据
quux <- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))
英文:
Here's the pivoting part, not sure of the best way (atm) to plot.
library(dplyr)
poss <- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=",")))))
poss
# pair
# 1 a
# 2 b
# 3 c
# 4 a,b
# 5 a,c
# 6 b,c
# 7 a,b,c
library(dplyr)
library(tidyr)
quux %>%
pivot_longer(-client_id) %>%
summarize(pair = paste(sort(unique(name[value==1])), collapse = ","), .by = client_id) %>%
count(pair) %>%
full_join(poss, by="pair") %>%
mutate(n = coalesce(n, 0)) %>%
arrange(nchar(pair), pair)
# # A tibble: 7 × 2
# pair n
# <chr> <dbl>
# 1 a 1
# 2 b 0
# 3 c 1
# 4 a,b 0
# 5 a,c 0
# 6 b,c 1
# 7 a,b,c 1
Data
quux <- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论