英文:
Correlation between zeros in a zero-one sequence
问题
I have a sequence of zeros and ones, for example, [0,1,0,0,0,1,0]. I want to measure the correlation between zeros in the sequence, i.e., given one zero, how likely is another zero to follow the first zero. I wanted to do this using the correlation coefficient. However, if I use numpy.corrcoef() in the sequence, it returns 1.0, which is not true. Any suggestions are appreciated.
Here's a code that reproduces the same:
import numpy as np
x = np.random.randint(0, 2, 1000)
x = x[..., np.newaxis]
rho = np.corrcoef(x.T)
print(rho)
英文:
I have a sequence of zeros and ones for example, [0,1,0,0,0,1,0]. I want to measure the correlation between zeros in the sequence i.e., given one zero how likely is another zero to follow the first zero. I wanted to do this using the correlation coefficient. However, if I use numpy.corrcoef() in the sequence, it returns 1.0 which is not true. Any suggestions are appreciated.
Here's a code that reproduces the same:
import numpy as np
x = np.random.randint(0,2,1000)
x = x[...,np.newaxis]
rho = np.corrcoef(x.T)
print(rho)
答案1
得分: 1
你需要比较 x
与它自己的位移:
np.random.seed(0)
x = np.random.randint(0, 2, 1000)
rho = np.corrcoef(x[:-1], x[1:])
输出:
array([[1. , 0.00292208], # <- 这是你想要的值
[0.00292208, 1. ]])
工作原理
我们将每个值与其后一个值进行比较:
x
# array([0, 1, 1, 0, 1, ..., 0, 0, 1, 0])
# 第一个到倒数第二个值
x[:-1]
# array([0, 1, 1, 0, 1, ..., 0, 0, 1])
# 倒数第二个值到最后一个值
x[1:]
# array([1, 1, 0, 1, ..., 0, 0, 1, 0])
英文:
You need to compare x
to its shifted self:
np.random.seed(0)
x = np.random.randint(0, 2, 1000)
rho = np.corrcoef(x[:-1], x[1:])
Output:
array([[1. , 0.00292208], # <- this is the value you want
[0.00292208, 1. ]])
how it works
We compare each value to the next one:
x
# array([0, 1, 1, 0, 1, ..., 0, 0, 1, 0])
# first to second-to-last value
x[:-1]
# array([0, 1, 1, 0, 1, ..., 0, 0, 1])
# second to last value
x[1:]
# array([1, 1, 0, 1, ..., 0, 0, 1, 0])
答案2
得分: 1
你说你想计算相关系数,但又提到了概率,Pearson相关系数(如numpy.corrcoef()
计算的那种)实际上不是概率。你在评论中澄清说你想计算x[i]
和x[i+1]
之间的相关系数。
代码示例:
import numpy as np
# 生成随机数据
x = np.random.randint(0, 2, 1000)
corr = np.corrcoef(x[:-1], np.roll(x, -1)[:-1])[0][1]
print(corr)
解释:
-
np.roll(x, -1)
生成了一个将元素向左旋转一个位置的副本。因此,每个元素都是对应元素的x
的后继者。 -
x
的最后一个元素没有后继者,因此我们需要忽略它,以及旋转版本的x
的相应元素。使用[:-1]
进行切片可以实现这一点。 -
生成的numpy数组适合通过
np.corrcoef()
进行相关计算。结果是一个对称矩阵,提供了每个变量与其他每个变量的相关系数。在我们的情况下,只有两个变量,所以这个矩阵将是2 x 2。主对角线上的元素是一个变量与自身的相关系数;它们应该都是1。非对角线上的元素是一个变量与另一个不同变量的相关系数,它们应该是对称的。我们可以选择其中一个,上面的代码选择了
[0][1]
。
请注意,对于随机数据(例如此示例),交叉相关系数应该大致为零。这是我们(伪)随机数生成器质量的一个衡量标准。
英文:
Your question is somewhat inconsistent, in that you say one one hand that you want to compute a correlation coefficient, and on the other hand that you want to compute a probability. The general idea of "correlation" is related to probability, but Pearson correlation coefficients, such as are computed by numpy.corrcoef()
are not probabilities at all. You clarified in comments that you want to compute correlation coefficients for x[i+1]
with respect to x[i]
.
That might look like this:
import numpy as np
# Generate random data
x = np.random.randint(0, 2, 1000)
corr = np.corrcoef(x[:-1], np.roll(x, -1)[:-1])[0][1]
print(corr)
Explanation:
-
np.roll(x, -1)
generates a copy ofx
in which the elements have been rotated one position to the left. Thus each element is the successor (inx
) of the corresponding element ofx
. -
the last element of
x
has no successor, so we need to ignore that, and also the corresponding element of the rotated version ofx
. The slicing with[:-1]
accomplishes this. -
the resulting numpy arrays are suitable for a correlation computation via
np.corrcoef()
. The result is a symmetric matrix providing the correlation coefficient of each variable with every other. In our case there are only two variables, so this matrix will be 2 x 2.The elements on the main diagonal are correlation coefficients of one of the variables with itself; these should be identically 1. The off-diagonal elements are correlation coefficients for one variable against a different one, and these should be symmetric. We can choose either one, and the above chooses
[0][1]
.
Note well that for random data, such as this example, the cross correlation should be approximately zero. (This is a measure of the quality of our (pseudo-)random number generator.)
答案3
得分: 0
以下是您要翻译的代码部分:
import numpy as np
x = np.array([ 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1 ])
x0 = x[:-1]
x1 = x[1:]
equal = (x0 == x1) & ~x0
zeros = len(x) - sum(x)
coef = sum(equal) / zeros
print(x0)
print(x1)
print(equal)
print(coef)
输出:
[0 0 0 1 0 0 0 1 0 0 0 1 0 0 0]
[0 0 1 0 0 0 1 0 0 0 1 0 0 0 1]
[1 1 0 0 1 1 0 0 1 1 0 0 1 1 0]
0.6666666666666666
请告诉我如果您需要任何进一步的帮助。
英文:
What you're asking for is not hard to compute. I think you want to know the chances that a 0 is followed by another zero. This code does that:
import numpy as np
x = np.array([ 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1 ])
x0 = x[:-1]
x1 = x[1:]
equal = (x0 == x1) & ~x0
zeros = len(x) - sum(x)
coef = sum(equal) / zeros
print(x0)
print(x1)
print(equal)
print(coef)
Output:
[0 0 0 1 0 0 0 1 0 0 0 1 0 0 0]
[0 0 1 0 0 0 1 0 0 0 1 0 0 0 1]
[1 1 0 0 1 1 0 0 1 1 0 0 1 1 0]
0.6666666666666666
So, 2/3 of the zeros are followed by another zero, which is correct.
What I do there is compute the xor of the two sequences, which is 1 where the values are identical. I then NAND that with the original sequence, which leaves 1s only where there were two zeros. By dividing the sum of that list by the number of 0s in the original, we get the answer.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论