2023年3月12日 19:25:07go评论165阅读模式

英文:

Forming a symmetric matrix counting instances of being in same cluster

问题

城市数据库按年份划分为不同的聚类。换句话说，我在不同年份的不同城市数据库上应用了社区检测算法，基于模块性进行划分。最终的数据库（一个模拟示例）如下：

v1 城市 聚类 年份
0 "城市1"  0  2000 
1 "城市2"  2  2000
2 "城市3"  1  2000
3 "城市4"  0  2000
4 "城市5"  2  2000
0 "城市1"  2  2001
1 "城市2"  1  2001
2 "城市3"  0  2001
3 "城市4"  0  2001
4 "城市5"  0  2001
0 "城市1"  1  2002
1 "城市2"  2  2002
2 "城市3"  0  2002
3 "城市4"  0  2002
4 "城市5"  1  2002

现在，我想要做的是计算每年同一城市与其他城市在相同聚类中出现的次数。因此，在上面的模拟示例中，我应该得到一个5乘5的对称矩阵，其中行和列都是城市，每个条目表示城市I和城市J在所有年份中在相同聚类中出现的次数（不考虑是哪个聚类）：


       城市1 城市2 城市3 城市4 城市5
城市1   .     0.    0.     1.    1
城市2.  0.    .     0.     0.    1
城市3.  0.    0.    .      2.    1
城市4.  1.    0.    2      .     1.  
城市5.  1.    1     1.     1.    .

我在使用Python进行工作，但即使解决方案是在Matlab或R中，也没问题。

谢谢！

英文:

I have a database that comprises cities divided into clusters for each year. In other words, I applied a community detection algorithm for different databases containing cities in different years base on modularity.
The final database (a mock example) looks like this:

v1 city cluster year
0 &quot;city1&quot;  0  2000 
1 &quot;city2&quot;  2. 2000
2 &quot;city3&quot;  1. 2000
3 &quot;city4&quot;  0  2000
4 &quot;city5&quot;  2  2000
0 &quot;city1&quot;  2  2001
1 &quot;city2&quot;  1  2001
2 &quot;city3&quot;  0  2001
3 &quot;city4&quot;  0  2001
4 &quot;city5&quot;  0  2001
0 &quot;city1&quot;  1  2002
1 &quot;city2&quot;  2  2002
2 &quot;city3&quot;  0  2002
3 &quot;city4&quot;  0  2002
4 &quot;city5&quot;  1  2002

Now what would like to do is counting how many times a city ends up in the same cluster as another city each year.
So in the mock example above I should end up with a 5 times 5 symmetric matrix where rows and columns are cities where each entry represent the number of times that city I and j are in the same cluster (independently of which cluster) in all years:


       city1 city2 city3 city4 city5
city1   .     0.    0.     1.    1
city2.  0.    .     0.     0.    1
city3.  0.    0.    .      2.    1
city4.  1.    0.    2      .     1.  
city5.  1.    1     1.     1.    .

I am working in python but it's fine even if the solution is in matlab or R.

Thank you

答案1

得分: 1

在 R 中，我们可以使用 crossprod 与 table - 只需将 'year' 和 'cluster' 拼接成一个字符串，然后使用 city 获取 table，再对输出应用 crossprod，然后通过将其分配为 0 来修改对角线的值。

out <- `diag<-`(crossprod(with(df1, table(paste(year, cluster), city))), 0)

输出

out
      city
city    city1 city2 city3 city4 city5
  city1     0     0     0     1     1
  city2     0     0     0     0     1
  city3     0     0     0     2     1
  city4     1     0     2     0     1
  city5     1     1     1     1     0

如果我们需要一个稀疏选项

library(Matrix)
Matrix(out)

得到一个 5x5 的稀疏矩阵，类别为 "dsCMatrix"。

5 x 5 sparse Matrix of class "dsCMatrix"
       city
city    city1 city2 city3 city4 city5
  city1     .     .     .     1     1
  city2     .     .     .     .     1
  city3     .     .     .     2     1
  city4     1     .     2     .     1
  city5     1     1     1     1     .

数据

df1 <- structure(list(v1 = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L, 
0L, 1L, 2L, 3L, 4L), city = c("city1", "city2", "city3", "city4", 
"city5", "city1", "city2", "city3", "city4", "city5", "city1", 
"city2", "city3", "city4", "city5"), cluster = c(0L, 2L, 1L, 
0L, 2L, 2L, 1L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 1L), year = c(2000L, 
2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 
2002L, 2002L, 2002L, 2002L, 2002L)), class = "data.frame", 
row.names = c(NA, -15L))

英文:

In R, we may use crossprod with table - just paste the 'year', 'cluster' to a single string, get the table with city and apply crossprod on the output, then modify the diagonal value by assigning it to 0

out &lt;- `diag&lt;-`(crossprod(with(df1, table(paste(year, cluster), city))), 0)

-output

out
      city
city    city1 city2 city3 city4 city5
  city1     0     0     0     1     1
  city2     0     0     0     0     1
  city3     0     0     0     2     1
  city4     1     0     2     0     1
  city5     1     1     1     1     0

If we need a sparse option

library(Matrix)
&gt; Matrix(out)
5 x 5 sparse Matrix of class &quot;dsCMatrix&quot;
       city
city    city1 city2 city3 city4 city5
  city1     .     .     .     1     1
  city2     .     .     .     .     1
  city3     .     .     .     2     1
  city4     1     .     2     .     1
  city5     1     1     1     1     .

data

df1 &lt;- structure(list(v1 = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L, 
0L, 1L, 2L, 3L, 4L), city = c(&quot;city1&quot;, &quot;city2&quot;, &quot;city3&quot;, &quot;city4&quot;, 
&quot;city5&quot;, &quot;city1&quot;, &quot;city2&quot;, &quot;city3&quot;, &quot;city4&quot;, &quot;city5&quot;, &quot;city1&quot;, 
&quot;city2&quot;, &quot;city3&quot;, &quot;city4&quot;, &quot;city5&quot;), cluster = c(0L, 2L, 1L, 
0L, 2L, 2L, 1L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 1L), year = c(2000L, 
2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 
2002L, 2002L, 2002L, 2002L, 2002L)), class = &quot;data.frame&quot;, 
row.names = c(NA, 
-15L))

答案2

得分: 0

在R中，使用table和[t]crossprod可以直接计算共现矩阵。我们可以按年份计算矩阵并求和，如下所示：

con <- textConnection('
v1 city cluster year
0 "city1" 0 2000 
1 "city2" 2 2000
2 "city3" 1 2000
3 "city4" 0 2000
4 "city5" 2 2000
0 "city1" 2 2001
1 "city2" 1 2001
2 "city3" 0 2001
3 "city4" 0 2001
4 "city5" 0 2001
0 "city1" 1 2002
1 "city2" 2 2002
2 "city3" 0 2002
3 "city4" 0 2002
4 "city5" 1 2002
')
d <- read.table(con, header = TRUE)
close(con)
x <- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
x

       city
city    city1 city2 city3 city4 city5
  city1     3     0     0     1     1
  city2     0     3     0     0     1
  city3     0     0     3     2     1
  city4     1     0     2     3     1
  city5     1     1     1     1     3

对角线上有3，因为城市每年都与自己匹配。如果你希望对角线上为零，可以添加以下代码：

diag(x) <- 0

如果你不喜欢带有“city”的冗余注释，可以添加以下代码：

dimnames(x) <- unname(dimnames(x))

如果你想将结果存储为形式上对称、形式上稀疏的矩阵，可以添加以下代码：

library(Matrix)
x <- as(x, "CsparseMatrix")
x

5 x 5 sparse Matrix of class "dsCMatrix"
      city1 city2 city3 city4 city5
city1     .     .     .     1     1
city2     .     .     .     .     1
city3     .     .     .     2     1
city4     1     .     2     .     1
city5     1     1     1     1     .

英文:

In R, co-occurrence matrices are computed straightforwardly with table and [t]crossprod. We can compute the matrices by year and take the sum, like so:

con &lt;- textConnection(&#39;
v1 city cluster year
0 &quot;city1&quot; 0 2000 
1 &quot;city2&quot; 2 2000
2 &quot;city3&quot; 1 2000
3 &quot;city4&quot; 0 2000
4 &quot;city5&quot; 2 2000
0 &quot;city1&quot; 2 2001
1 &quot;city2&quot; 1 2001
2 &quot;city3&quot; 0 2001
3 &quot;city4&quot; 0 2001
4 &quot;city5&quot; 0 2001
0 &quot;city1&quot; 1 2002
1 &quot;city2&quot; 2 2002
2 &quot;city3&quot; 0 2002
3 &quot;city4&quot; 0 2002
4 &quot;city5&quot; 1 2002
&#39;)
d &lt;- read.table(con, header = TRUE)
close(con)
x &lt;- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
x

       city
city    city1 city2 city3 city4 city5
  city1     3     0     0     1     1
  city2     0     3     0     0     1
  city3     0     0     3     2     1
  city4     1     0     2     3     1
  city5     1     1     1     1     3

There are threes on the diagonal because cities match themselves every year. If you prefer, say, zeros on the diagonal, then you can add:

diag(x) &lt;- 0

If you don't like the redundant annotation with "city", then you can add:

dimnames(x) &lt;- unname(dimnames(x))

And if you want to store the result as a formally symmetric, formally sparse matrix, then you can add:

library(Matrix)
x &lt;- as(x, &quot;CsparseMatrix&quot;)
x

5 x 5 sparse Matrix of class &quot;dsCMatrix&quot;
      city1 city2 city3 city4 city5
city1     .     .     .     1     1
city2     .     .     .     .     1
city3     .     .     .     2     1
city4     1     .     2     .     1
city5     1     1     1     1     .

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

形成一个对称矩阵，计算在同一群集中的实例数。

问题

答案1

数据

data

答案2

Parallel computing for mediation analyses – foreach and dopar Error not finding assigned object within loop

无法在Windows 11上的Python 3.5.2环境中使用pip安装任何库。

Tkinter Entry框的字体未随默认设置更改。

如何使用Python BigQuery客户端更新BigQuery分区过期时间？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。