问题

我使用Python中的deepgraph来计算大矩阵的相关系数。输出是一个多索引数据框，示例如下：

我想要添加一个带有相应p值的列。原始代码和输入示例来自https://deepgraph.readthedocs.io/en/latest/tutorials/pairwise_correlations.html

被井号包围的代码部分是我用来获取p值的方法。我希望稍后合并单独的边缘列表。

我得到了以下错误：

有什么建议吗？谢谢！

英文:

I am using deepgraph in python to compute correlation coefficients for large matrices. The output gives a multi-index data frame:

s    t
0    1    -0.006066
     2     0.094063
     3    -0.025529
     4     0.074080
     5     0.035490
     6     0.005221
     7     0.032064

I want to add a column with corresponding p-values.
The original code with input example is obtained from https://deepgraph.readthedocs.io/en/latest/tutorials/pairwise_correlations.html
The code surrounded by hashtags is my approach to get p-values.
I want to merge the separate edge lists later on.

#!/bin/python

import os
from multiprocessing import Pool
import numpy as np
import pandas as pd
import deepgraph as dg
from numpy.random import RandomState
from scipy.stats import pearsonr, spearmanr

prng = RandomState(0)
n_features = int(5e3)
n_samples = int(1e2)
X = prng.randint(100, size=(n_features, n_samples)).astype(np.float64)

# Spearman&#39;s correlation coefficients
X = X.argsort(axis=1).argsort(axis=1)
# whiten variables for fast parallel computation later on
X = (X - X.mean(axis=1, keepdims=True)) / X.std(axis=1, keepdims=True)
# save in binary format
np.save(&#39;samples&#39;, X)

# parameters (change these to control RAM usage)
step_size = 1e5
n_processes = 100

# load samples as memory-map
X = np.load(&#39;samples.npy&#39;, mmap_mode=&#39;r&#39;)

# create node table that stores references to the mem-mapped samples
v = pd.DataFrame({&#39;index&#39;: range(X.shape[0])})


# connector function to compute pairwise pearson correlations
def corr(index_s, index_t):
    features_s = X[index_s]
    features_t = X[index_t]
    corr = np.einsum(&#39;ij,ij-&gt;i&#39;, features_s, features_t) / n_samples
    return corr


#################################
def p_Val(index_s, index_t):
    features_s = X[index_s]
    features_t = X[index_t]
    p = spearmanr(features_s, features_t)[1]
    return p
#################################

# index array for parallelization
pos_array = np.array(np.linspace(0, n_features*(n_features-1)//2, n_processes), dtype=int)

# parallel computation
def create_ei(i):
    from_pos = pos_array[i]
    to_pos = pos_array[i+1]
    # initiate DeepGraph
    g = dg.DeepGraph(v)
    # create edges
    g.create_edges(connectors=corr, step_size=step_size, from_pos=from_pos, to_pos=to_pos)
    # store edge table
    g.e.to_pickle(&#39;tmp/correlations/{}_corr.pickle&#39;.format(str(i).zfill(3)))
    #################################
    gp = dg.DeepGraph(v)
    # create edges
    gp.create_edges(connectors=p_Val, step_size=step_size, from_pos=from_pos, to_pos=to_pos)
    # store edge table
    gp.e.to_pickle(&#39;tmp/correlations/{}_pval.pickle&#39;.format(str(i).zfill(3)))
    #################################

# computation
if __name__ == &#39;__main__&#39;:
    os.makedirs(&quot;tmp/correlations&quot;, exist_ok=True)
    indices = np.arange(0, n_processes - 1)
    p = Pool()
    for _ in p.imap_unordered(create_ei, indices):
        pass


# store correlation values
files = os.listdir(&#39;tmp/correlations/&#39;)
files.sort()
for f in files:
    et = pd.read_pickle(&#39;tmp/correlations/{}&#39;.format(f))
    print(et)
store.close()

I get the following error:

Traceback (most recent call last):
  File &quot;/lib/python3.9/multiprocessing/pool.py&quot;, line 125, in worker
    result = (True, func(*args, **kwds))
  File &quot;pairwise_corr.py&quot;, line 64, in create_ei
    gp.create_edges(connectors=p_Val, step_size=step_size, from_pos=from_pos, to_pos=to_pos)
  File &quot;/lib/python3.9/site-packages/deepgraph/deepgraph.py&quot;, line 616, in create_edges
    self.e = _matrix_iterator(
  File &quot;/lib/python3.9/site-packages/deepgraph/deepgraph.py&quot;, line 4875, in _matrix_iterator
    ei = _select_and_return(vi, sources_k, targets_k, ft_feature,
  File &quot;/lib/python3.9/site-packages/deepgraph/deepgraph.py&quot;, line 5339, in _select_and_return
    ei = pd.DataFrame({col: data[col] for col in coldtypedic})
  File &quot;/lib/python3.9/site-packages/pandas/core/frame.py&quot;, line 614, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File &quot;/lib/python3.9/site-packages/pandas/core/internals/construction.py&quot;, line 464, in dict_to_mgr
    return arrays_to_mgr(
  File &quot;/lib/python3.9/site-packages/pandas/core/internals/construction.py&quot;, line 124, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File &quot;/lib/python3.9/site-packages/pandas/core/internals/construction.py&quot;, line 589, in _homogenize
    val = sanitize_array(
  File &quot;/lib/python3.9/site-packages/pandas/core/construction.py&quot;, line 576, in sanitize_array
    subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)
  File &quot;/lib/python3.9/site-packages/pandas/core/construction.py&quot;, line 627, in _sanitize_ndim
    raise ValueError(&quot;Data must be 1-dimensional&quot;)
ValueError: Data must be 1-dimensional

Any suggestions?
Thanks!

答案1

得分: 0

我能够用以下方式解决它：

def p_Val(index_s, index_t):
    features_s = X[index_s]
    features_t = X[index_t]
    p = [pearsonr(features_s[i, :], features_t[i, :])[1] for i in range(len(features_s))]
    p_val = np.asarray(p)
    return p_val

英文:

I was able to solve it with

def p_Val(index_s, index_t):
    features_s = X[index_s]
    features_t = X[index_t]
    p = [pearsonr(features_s[i, :], features_t[i, :])[1] for i in range(len(features_s))]
    p_val = np.asarray(p)
    return p_val

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取深度图的相关性 p 值

问题

答案1

优化使用zip()函数处理大数据计算的for循环

打印数据框中每个唯一值的值，在for循环中。

如何将数据插入到CSV文件的所需列中？

使用PySpark创建时间戳列

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论