计算成对出现的次数并绘制它们。

huangapple go评论72阅读模式
英文:

Count pairs occurence and plot them

问题

我有一个名为x的数据集,例如:

client_id a b c
1 1 0 0
2 0 1 1
3 0 0 1
4 1 1 1

然后,我想创建另一个表格,计算表格中每个可能对的出现次数。所以结果会像这样:

pair count
a 1
b 0
c 1
a,b 0
a,c 0
b,c 1
a,b,c 1

这意味着例如对于第一行,a 单独出现了(等于1),其他列没有出现(都等于0)只有一次。

之后,我想以ggplot最佳方式可视化成对出现的结果。

英文:

I have this dataset called x for example:

client_id a b c
1 1 0 0
2 0 1 1
3 0 0 1
4 1 1 1

I then want to create another table, that calculates the number of occurence of each possible pair in the table. So the result would be something like this:

pair count
a 1
b 0
c 1
a,b 0
a,c 0
b,c 1
a,b,c 1

That means for example that for the first row, a occured (was equal to 1) alone (means other columns didn't, and so they were at 0) only once.

After this, I want the best way to visualize the result of the pairs in ggplot.

Edit: More explanation of the second table:

a (single letter): a is the strictly one which occured (rest are 0).
a,b (two letters): a and b both occured, but not the rest.
a,b,c (three letters): all a,b,c occured at the same time (all are set to 1).

答案1

得分: 1

这是关于数据透视部分的内容,目前不确定最佳绘图方法。

library(dplyr)
poss <- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=",")))))
poss
#    pair
# 1     a
# 2     b
# 3     c
# 4   a,b
# 5   a,c
# 6   b,c
# 7 a,b,c


library(dplyr)
library(tidyr)
quux %>%
  pivot_longer(-client_id) %>%
  summarize(pair = paste(sort(unique(name[value==1])), collapse = ","), .by = client_id) %>%
  count(pair) %>%
  full_join(poss, by="pair") %>%
  mutate(n = coalesce(n, 0)) %>%
  arrange(nchar(pair), pair)
# # A tibble: 7 × 2
#   pair      n
#   <chr> <dbl>
# 1 a         1
# 2 b         0
# 3 c         1
# 4 a,b       0
# 5 a,c       0
# 6 b,c       1
# 7 a,b,c     1

数据

quux <- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))
英文:

Here's the pivoting part, not sure of the best way (atm) to plot.

library(dplyr)
poss <- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=",")))))
poss
#    pair
# 1     a
# 2     b
# 3     c
# 4   a,b
# 5   a,c
# 6   b,c
# 7 a,b,c


library(dplyr)
library(tidyr)
quux %>%
  pivot_longer(-client_id) %>%
  summarize(pair = paste(sort(unique(name[value==1])), collapse = ","), .by = client_id) %>%
  count(pair) %>%
  full_join(poss, by="pair") %>%
  mutate(n = coalesce(n, 0)) %>%
  arrange(nchar(pair), pair)
# # A tibble: 7 × 2
#   pair      n
#   <chr> <dbl>
# 1 a         1
# 2 b         0
# 3 c         1
# 4 a,b       0
# 5 a,c       0
# 6 b,c       1
# 7 a,b,c     1

Data


quux <- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))

huangapple
  • 本文由 发表于 2023年7月12日 22:28:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671674.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定