2023年7月12日 22:28:54go评论108阅读模式

英文:

Count pairs occurence and plot them

问题

我有一个名为x的数据集，例如：

client_id	a	b	c
1	1	0	0
2	0	1	1
3	0	0	1
4	1	1	1

然后，我想创建另一个表格，计算表格中每个可能对的出现次数。所以结果会像这样：

pair	count
a	1
b	0
c	1
a,b	0
a,c	0
b,c	1
a,b,c	1

这意味着例如对于第一行，a 单独出现了（等于1），其他列没有出现（都等于0）只有一次。

之后，我想以ggplot最佳方式可视化成对出现的结果。

英文:

I have this dataset called x for example:

client_id	a	b	c
1	1	0	0
2	0	1	1
3	0	0	1
4	1	1	1

I then want to create another table, that calculates the number of occurence of each possible pair in the table. So the result would be something like this:

pair	count
a	1
b	0
c	1
a,b	0
a,c	0
b,c	1
a,b,c	1

That means for example that for the first row, a occured (was equal to 1) alone (means other columns didn't, and so they were at 0) only once.

After this, I want the best way to visualize the result of the pairs in ggplot.

Edit: More explanation of the second table:

a (single letter): a is the strictly one which occured (rest are 0).
a,b (two letters): a and b both occured, but not the rest.
a,b,c (three letters): all a,b,c occured at the same time (all are set to 1).

答案1

得分: 1

这是关于数据透视部分的内容，目前不确定最佳绘图方法。

library(dplyr)
poss &lt;- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=&quot;,&quot;)))))
poss
#    pair
# 1     a
# 2     b
# 3     c
# 4   a,b
# 5   a,c
# 6   b,c
# 7 a,b,c
library(dplyr)
library(tidyr)
quux %&gt;%
  pivot_longer(-client_id) %&gt;%
  summarize(pair = paste(sort(unique(name[value==1])), collapse = &quot;,&quot;), .by = client_id) %&gt;%
  count(pair) %&gt;%
  full_join(poss, by=&quot;pair&quot;) %&gt;%
  mutate(n = coalesce(n, 0)) %&gt;%
  arrange(nchar(pair), pair)
# # A tibble: 7 &#215; 2
#   pair      n
#   &lt;chr&gt; &lt;dbl&gt;
# 1 a         1
# 2 b         0
# 3 c         1
# 4 a,b       0
# 5 a,c       0
# 6 b,c       1
# 7 a,b,c     1

数据

quux &lt;- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = &quot;data.frame&quot;, row.names = c(NA, -4L))

英文:

Here's the pivoting part, not sure of the best way (atm) to plot.

library(dplyr)
poss &lt;- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=&quot;,&quot;)))))
poss
#    pair
# 1     a
# 2     b
# 3     c
# 4   a,b
# 5   a,c
# 6   b,c
# 7 a,b,c
library(dplyr)
library(tidyr)
quux %&gt;%
  pivot_longer(-client_id) %&gt;%
  summarize(pair = paste(sort(unique(name[value==1])), collapse = &quot;,&quot;), .by = client_id) %&gt;%
  count(pair) %&gt;%
  full_join(poss, by=&quot;pair&quot;) %&gt;%
  mutate(n = coalesce(n, 0)) %&gt;%
  arrange(nchar(pair), pair)
# # A tibble: 7 &#215; 2
#   pair      n
#   &lt;chr&gt; &lt;dbl&gt;
# 1 a         1
# 2 b         0
# 3 c         1
# 4 a,b       0
# 5 a,c       0
# 6 b,c       1
# 7 a,b,c     1

Data

quux &lt;- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = &quot;data.frame&quot;, row.names = c(NA, -4L))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算成对出现的次数并绘制它们。

问题

答案1

创建柱状图从数据透视表

R Shiny 应用程序 – 有哪些可能性

将复杂的长宽数据集转换为R中的长数据集

将RData压缩以存储在数据库中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。