英文:
Forming a symmetric matrix counting instances of being in same cluster
问题
城市数据库按年份划分为不同的聚类。换句话说,我在不同年份的不同城市数据库上应用了社区检测算法,基于模块性进行划分。最终的数据库(一个模拟示例)如下:
v1 城市 聚类 年份
0 "城市1" 0 2000
1 "城市2" 2 2000
2 "城市3" 1 2000
3 "城市4" 0 2000
4 "城市5" 2 2000
0 "城市1" 2 2001
1 "城市2" 1 2001
2 "城市3" 0 2001
3 "城市4" 0 2001
4 "城市5" 0 2001
0 "城市1" 1 2002
1 "城市2" 2 2002
2 "城市3" 0 2002
3 "城市4" 0 2002
4 "城市5" 1 2002
现在,我想要做的是计算每年同一城市与其他城市在相同聚类中出现的次数。因此,在上面的模拟示例中,我应该得到一个5乘5的对称矩阵,其中行和列都是城市,每个条目表示城市I和城市J在所有年份中在相同聚类中出现的次数(不考虑是哪个聚类):
城市1 城市2 城市3 城市4 城市5
城市1 . 0. 0. 1. 1
城市2. 0. . 0. 0. 1
城市3. 0. 0. . 2. 1
城市4. 1. 0. 2 . 1.
城市5. 1. 1 1. 1. .
我在使用Python进行工作,但即使解决方案是在Matlab或R中,也没问题。
谢谢!
英文:
I have a database that comprises cities divided into clusters for each year. In other words, I applied a community detection algorithm for different databases containing cities in different years base on modularity.
The final database (a mock example) looks like this:
v1 city cluster year
0 "city1" 0 2000
1 "city2" 2. 2000
2 "city3" 1. 2000
3 "city4" 0 2000
4 "city5" 2 2000
0 "city1" 2 2001
1 "city2" 1 2001
2 "city3" 0 2001
3 "city4" 0 2001
4 "city5" 0 2001
0 "city1" 1 2002
1 "city2" 2 2002
2 "city3" 0 2002
3 "city4" 0 2002
4 "city5" 1 2002
Now what would like to do is counting how many times a city ends up in the same cluster as another city each year.
So in the mock example above I should end up with a 5 times 5 symmetric matrix where rows and columns are cities where each entry represent the number of times that city I and j are in the same cluster (independently of which cluster) in all years:
city1 city2 city3 city4 city5
city1 . 0. 0. 1. 1
city2. 0. . 0. 0. 1
city3. 0. 0. . 2. 1
city4. 1. 0. 2 . 1.
city5. 1. 1 1. 1. .
I am working in python but it's fine even if the solution is in matlab or R.
Thank you
答案1
得分: 1
在 R
中,我们可以使用 crossprod
与 table
- 只需将 'year' 和 'cluster' 拼接成一个字符串,然后使用 city
获取 table
,再对输出应用 crossprod
,然后通过将其分配为 0 来修改对角线的值。
out <- `diag<-`(crossprod(with(df1, table(paste(year, cluster), city))), 0)
- 输出
out
city
city city1 city2 city3 city4 city5
city1 0 0 0 1 1
city2 0 0 0 0 1
city3 0 0 0 2 1
city4 1 0 2 0 1
city5 1 1 1 1 0
如果我们需要一个稀疏选项
library(Matrix)
Matrix(out)
得到一个 5x5 的稀疏矩阵,类别为 "dsCMatrix"。
5 x 5 sparse Matrix of class "dsCMatrix"
city
city city1 city2 city3 city4 city5
city1 . . . 1 1
city2 . . . . 1
city3 . . . 2 1
city4 1 . 2 . 1
city5 1 1 1 1 .
数据
df1 <- structure(list(v1 = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L,
0L, 1L, 2L, 3L, 4L), city = c("city1", "city2", "city3", "city4",
"city5", "city1", "city2", "city3", "city4", "city5", "city1",
"city2", "city3", "city4", "city5"), cluster = c(0L, 2L, 1L,
0L, 2L, 2L, 1L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 1L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L,
2002L, 2002L, 2002L, 2002L, 2002L)), class = "data.frame",
row.names = c(NA, -15L))
英文:
In R
, we may use crossprod
with table
- just paste
the 'year', 'cluster' to a single string, get the table
with city
and apply crossprod
on the output, then modify the diag
onal value by assigning it to 0
out <- `diag<-`(crossprod(with(df1, table(paste(year, cluster), city))), 0)
-output
out
city
city city1 city2 city3 city4 city5
city1 0 0 0 1 1
city2 0 0 0 0 1
city3 0 0 0 2 1
city4 1 0 2 0 1
city5 1 1 1 1 0
If we need a sparse option
library(Matrix)
> Matrix(out)
5 x 5 sparse Matrix of class "dsCMatrix"
city
city city1 city2 city3 city4 city5
city1 . . . 1 1
city2 . . . . 1
city3 . . . 2 1
city4 1 . 2 . 1
city5 1 1 1 1 .
data
df1 <- structure(list(v1 = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L,
0L, 1L, 2L, 3L, 4L), city = c("city1", "city2", "city3", "city4",
"city5", "city1", "city2", "city3", "city4", "city5", "city1",
"city2", "city3", "city4", "city5"), cluster = c(0L, 2L, 1L,
0L, 2L, 2L, 1L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 1L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L,
2002L, 2002L, 2002L, 2002L, 2002L)), class = "data.frame",
row.names = c(NA,
-15L))
答案2
得分: 0
在R中,使用table
和[t]crossprod
可以直接计算共现矩阵。我们可以按年份计算矩阵并求和,如下所示:
con <- textConnection('
v1 city cluster year
0 "city1" 0 2000
1 "city2" 2 2000
2 "city3" 1 2000
3 "city4" 0 2000
4 "city5" 2 2000
0 "city1" 2 2001
1 "city2" 1 2001
2 "city3" 0 2001
3 "city4" 0 2001
4 "city5" 0 2001
0 "city1" 1 2002
1 "city2" 2 2002
2 "city3" 0 2002
3 "city4" 0 2002
4 "city5" 1 2002
')
d <- read.table(con, header = TRUE)
close(con)
x <- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
x
city
city city1 city2 city3 city4 city5
city1 3 0 0 1 1
city2 0 3 0 0 1
city3 0 0 3 2 1
city4 1 0 2 3 1
city5 1 1 1 1 3
对角线上有3,因为城市每年都与自己匹配。如果你希望对角线上为零,可以添加以下代码:
diag(x) <- 0
如果你不喜欢带有“city”的冗余注释,可以添加以下代码:
dimnames(x) <- unname(dimnames(x))
如果你想将结果存储为形式上对称、形式上稀疏的矩阵,可以添加以下代码:
library(Matrix)
x <- as(x, "CsparseMatrix")
x
5 x 5 sparse Matrix of class "dsCMatrix"
city1 city2 city3 city4 city5
city1 . . . 1 1
city2 . . . . 1
city3 . . . 2 1
city4 1 . 2 . 1
city5 1 1 1 1 .
英文:
In R, co-occurrence matrices are computed straightforwardly with table
and [t]crossprod
. We can compute the matrices by year and take the sum, like so:
con <- textConnection('
v1 city cluster year
0 "city1" 0 2000
1 "city2" 2 2000
2 "city3" 1 2000
3 "city4" 0 2000
4 "city5" 2 2000
0 "city1" 2 2001
1 "city2" 1 2001
2 "city3" 0 2001
3 "city4" 0 2001
4 "city5" 0 2001
0 "city1" 1 2002
1 "city2" 2 2002
2 "city3" 0 2002
3 "city4" 0 2002
4 "city5" 1 2002
')
d <- read.table(con, header = TRUE)
close(con)
x <- with(d, Reduce(`+`, apply(table(city, cluster, year), 3L, tcrossprod, simplify = FALSE)))
x
city
city city1 city2 city3 city4 city5
city1 3 0 0 1 1
city2 0 3 0 0 1
city3 0 0 3 2 1
city4 1 0 2 3 1
city5 1 1 1 1 3
There are threes on the diagonal because cities match themselves every year. If you prefer, say, zeros on the diagonal, then you can add:
diag(x) <- 0
If you don't like the redundant annotation with "city", then you can add:
dimnames(x) <- unname(dimnames(x))
And if you want to store the result as a formally symmetric, formally sparse matrix, then you can add:
library(Matrix)
x <- as(x, "CsparseMatrix")
x
5 x 5 sparse Matrix of class "dsCMatrix"
city1 city2 city3 city4 city5
city1 . . . 1 1
city2 . . . . 1
city3 . . . 2 1
city4 1 . 2 . 1
city5 1 1 1 1 .
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论