形成一个对称矩阵,计算在同一群集中的实例数。

huangapple go评论165阅读模式
英文:

Forming a symmetric matrix counting instances of being in same cluster

问题

城市数据库按年份划分为不同的聚类。换句话说,我在不同年份的不同城市数据库上应用了社区检测算法,基于模块性进行划分。最终的数据库(一个模拟示例)如下:

  1. v1 城市 聚类 年份
  2. 0 "城市1" 0 2000
  3. 1 "城市2" 2 2000
  4. 2 "城市3" 1 2000
  5. 3 "城市4" 0 2000
  6. 4 "城市5" 2 2000
  7. 0 "城市1" 2 2001
  8. 1 "城市2" 1 2001
  9. 2 "城市3" 0 2001
  10. 3 "城市4" 0 2001
  11. 4 "城市5" 0 2001
  12. 0 "城市1" 1 2002
  13. 1 "城市2" 2 2002
  14. 2 "城市3" 0 2002
  15. 3 "城市4" 0 2002
  16. 4 "城市5" 1 2002

现在,我想要做的是计算每年同一城市与其他城市在相同聚类中出现的次数。因此,在上面的模拟示例中,我应该得到一个5乘5的对称矩阵,其中行和列都是城市,每个条目表示城市I和城市J在所有年份中在相同聚类中出现的次数(不考虑是哪个聚类):

  1. 城市1 城市2 城市3 城市4 城市5
  2. 城市1 . 0. 0. 1. 1
  3. 城市2. 0. . 0. 0. 1
  4. 城市3. 0. 0. . 2. 1
  5. 城市4. 1. 0. 2 . 1.
  6. 城市5. 1. 1 1. 1. .

我在使用Python进行工作,但即使解决方案是在Matlab或R中,也没问题。

谢谢!

英文:

I have a database that comprises cities divided into clusters for each year. In other words, I applied a community detection algorithm for different databases containing cities in different years base on modularity.
The final database (a mock example) looks like this:

  1. v1 city cluster year
  2. 0 "city1" 0 2000
  3. 1 "city2" 2. 2000
  4. 2 "city3" 1. 2000
  5. 3 "city4" 0 2000
  6. 4 "city5" 2 2000
  7. 0 "city1" 2 2001
  8. 1 "city2" 1 2001
  9. 2 "city3" 0 2001
  10. 3 "city4" 0 2001
  11. 4 "city5" 0 2001
  12. 0 "city1" 1 2002
  13. 1 "city2" 2 2002
  14. 2 "city3" 0 2002
  15. 3 "city4" 0 2002
  16. 4 "city5" 1 2002

Now what would like to do is counting how many times a city ends up in the same cluster as another city each year.
So in the mock example above I should end up with a 5 times 5 symmetric matrix where rows and columns are cities where each entry represent the number of times that city I and j are in the same cluster (independently of which cluster) in all years:

  1. city1 city2 city3 city4 city5
  2. city1 . 0. 0. 1. 1
  3. city2. 0. . 0. 0. 1
  4. city3. 0. 0. . 2. 1
  5. city4. 1. 0. 2 . 1.
  6. city5. 1. 1 1. 1. .

I am working in python but it's fine even if the solution is in matlab or R.

Thank you

答案1

得分: 1

R 中,我们可以使用 crossprodtable - 只需将 'year' 和 'cluster' 拼接成一个字符串,然后使用 city 获取 table,再对输出应用 crossprod,然后通过将其分配为 0 来修改对角线的值。

  1. out <- `diag<-`(crossprod(with(df1, table(paste(year, cluster), city))), 0)
  • 输出
  1. out
  2. city
  3. city city1 city2 city3 city4 city5
  4. city1 0 0 0 1 1
  5. city2 0 0 0 0 1
  6. city3 0 0 0 2 1
  7. city4 1 0 2 0 1
  8. city5 1 1 1 1 0

如果我们需要一个稀疏选项

  1. library(Matrix)
  2. Matrix(out)

得到一个 5x5 的稀疏矩阵,类别为 "dsCMatrix"。

  1. 5 x 5 sparse Matrix of class "dsCMatrix"
  2. city
  3. city city1 city2 city3 city4 city5
  4. city1 . . . 1 1
  5. city2 . . . . 1
  6. city3 . . . 2 1
  7. city4 1 . 2 . 1
  8. city5 1 1 1 1 .

数据

  1. df1 <- structure(list(v1 = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L,
  2. 0L, 1L, 2L, 3L, 4L), city = c("city1", "city2", "city3", "city4",
  3. "city5", "city1", "city2", "city3", "city4", "city5", "city1",
  4. "city2", "city3", "city4", "city5"), cluster = c(0L, 2L, 1L,
  5. 0L, 2L, 2L, 1L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 1L), year = c(2000L,
  6. 2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L,
  7. 2002L, 2002L, 2002L, 2002L, 2002L)), class = "data.frame",
  8. row.names = c(NA, -15L))
英文:

In R, we may use crossprod with table - just paste the 'year', 'cluster' to a single string, get the table with city and apply crossprod on the output, then modify the diagonal value by assigning it to 0

  1. out &lt;- `diag&lt;-`(crossprod(with(df1, table(paste(year, cluster), city))), 0)

-output

  1. out
  2. city
  3. city city1 city2 city3 city4 city5
  4. city1 0 0 0 1 1
  5. city2 0 0 0 0 1
  6. city3 0 0 0 2 1
  7. city4 1 0 2 0 1
  8. city5 1 1 1 1 0

If we need a sparse option

  1. library(Matrix)
  2. &gt; Matrix(out)
  3. 5 x 5 sparse Matrix of class &quot;dsCMatrix&quot;
  4. city
  5. city city1 city2 city3 city4 city5
  6. city1 . . . 1 1
  7. city2 . . . . 1
  8. city3 . . . 2 1
  9. city4 1 . 2 . 1
  10. city5 1 1 1 1 .

data

  1. df1 &lt;- structure(list(v1 = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L,
  2. 0L, 1L, 2L, 3L, 4L), city = c(&quot;city1&quot;, &quot;city2&quot;, &quot;city3&quot;, &quot;city4&quot;,
  3. &quot;city5&quot;, &quot;city1&quot;, &quot;city2&quot;, &quot;city3&quot;, &quot;city4&quot;, &quot;city5&quot;, &quot;city1&quot;,
  4. &quot;city2&quot;, &quot;city3&quot;, &quot;city4&quot;, &quot;city5&quot;), cluster = c(0L, 2L, 1L,
  5. 0L, 2L, 2L, 1L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 1L), year = c(2000L,
  6. 2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L,
  7. 2002L, 2002L, 2002L, 2002L, 2002L)), class = &quot;data.frame&quot;,
  8. row.names = c(NA,
  9. -15L))

答案2

得分: 0

在R中,使用table[t]crossprod可以直接计算共现矩阵。我们可以按年份计算矩阵并求和,如下所示:

  1. con <- textConnection('
  2. v1 city cluster year
  3. 0 "city1" 0 2000
  4. 1 "city2" 2 2000
  5. 2 "city3" 1 2000
  6. 3 "city4" 0 2000
  7. 4 "city5" 2 2000
  8. 0 "city1" 2 2001
  9. 1 "city2" 1 2001
  10. 2 "city3" 0 2001
  11. 3 "city4" 0 2001
  12. 4 "city5" 0 2001
  13. 0 "city1" 1 2002
  14. 1 "city2" 2 2002
  15. 2 "city3" 0 2002
  16. 3 "city4" 0 2002
  17. 4 "city5" 1 2002
  18. ')
  19. d <- read.table(con, header = TRUE)
  20. close(con)
  21. x <- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
  22. x
  1. city
  2. city city1 city2 city3 city4 city5
  3. city1 3 0 0 1 1
  4. city2 0 3 0 0 1
  5. city3 0 0 3 2 1
  6. city4 1 0 2 3 1
  7. city5 1 1 1 1 3

对角线上有3,因为城市每年都与自己匹配。如果你希望对角线上为零,可以添加以下代码:

  1. diag(x) <- 0

如果你不喜欢带有“city”的冗余注释,可以添加以下代码:

  1. dimnames(x) <- unname(dimnames(x))

如果你想将结果存储为形式上对称、形式上稀疏的矩阵,可以添加以下代码:

  1. library(Matrix)
  2. x <- as(x, "CsparseMatrix")
  3. x
  1. 5 x 5 sparse Matrix of class "dsCMatrix"
  2. city1 city2 city3 city4 city5
  3. city1 . . . 1 1
  4. city2 . . . . 1
  5. city3 . . . 2 1
  6. city4 1 . 2 . 1
  7. city5 1 1 1 1 .
英文:

In R, co-occurrence matrices are computed straightforwardly with table and [t]crossprod. We can compute the matrices by year and take the sum, like so:

  1. con &lt;- textConnection(&#39;
  2. v1 city cluster year
  3. 0 &quot;city1&quot; 0 2000
  4. 1 &quot;city2&quot; 2 2000
  5. 2 &quot;city3&quot; 1 2000
  6. 3 &quot;city4&quot; 0 2000
  7. 4 &quot;city5&quot; 2 2000
  8. 0 &quot;city1&quot; 2 2001
  9. 1 &quot;city2&quot; 1 2001
  10. 2 &quot;city3&quot; 0 2001
  11. 3 &quot;city4&quot; 0 2001
  12. 4 &quot;city5&quot; 0 2001
  13. 0 &quot;city1&quot; 1 2002
  14. 1 &quot;city2&quot; 2 2002
  15. 2 &quot;city3&quot; 0 2002
  16. 3 &quot;city4&quot; 0 2002
  17. 4 &quot;city5&quot; 1 2002
  18. &#39;)
  19. d &lt;- read.table(con, header = TRUE)
  20. close(con)
  21. x &lt;- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
  22. x
  1. city
  2. city city1 city2 city3 city4 city5
  3. city1 3 0 0 1 1
  4. city2 0 3 0 0 1
  5. city3 0 0 3 2 1
  6. city4 1 0 2 3 1
  7. city5 1 1 1 1 3

There are threes on the diagonal because cities match themselves every year. If you prefer, say, zeros on the diagonal, then you can add:

  1. diag(x) &lt;- 0

If you don't like the redundant annotation with "city", then you can add:

  1. dimnames(x) &lt;- unname(dimnames(x))

And if you want to store the result as a formally symmetric, formally sparse matrix, then you can add:

  1. library(Matrix)
  2. x &lt;- as(x, &quot;CsparseMatrix&quot;)
  3. x
  1. 5 x 5 sparse Matrix of class &quot;dsCMatrix&quot;
  2. city1 city2 city3 city4 city5
  3. city1 . . . 1 1
  4. city2 . . . . 1
  5. city3 . . . 2 1
  6. city4 1 . 2 . 1
  7. city5 1 1 1 1 .

huangapple
  • 本文由 发表于 2023年3月12日 19:25:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712777.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定