Python Pandas Fisher Exact Test 2×2 Python Pandas Fisher精确检验2×2

huangapple go评论104阅读模式
英文:

Python Pandas Fisher Exact Test 2x2

问题

以下是您要翻译的内容:

  1. import pandas as pd
  2. import numpy as np
  3. arr = {
  4. "A" :[75.3,85.2,90.1,80.4,70.9],
  5. "B" :[80.6,90.2,85.7,70.8,95.1],
  6. "C" :[85.2,95.6,80.8,90.4,75.1],
  7. "D" :[90.3,85.5,80.0,70.5,95.2]
  8. }
  9. df = pd.DataFrame(arr)

我想在每一行上运行2x2费舍尔精确检验,并将p值保存在新列中。换句话说,我想对以下数组运行2x2费舍尔精确检验:

pvalue = [[df["A"][i], df["B"][i]], [df["C"][i], df["D"][i]]]

对于所有i在0到5之间,并将列表保存为数据框的另一列。

在"scipy.stats"中有一个名为"fisher_exact()"的函数,它将输出2x2数组的奇数值和p值。我目前有以下代码,它可以正常工作:

  1. import scipy.stats as stats
  2. pvalue = []
  3. for i in df.index.values:
  4. a = df["A"][i]
  5. b = df["B"][i]
  6. c = df["C"][i]
  7. d = df["D"][i]
  8. data = [[a,b],[c,d]]
  9. odd_value, p_value = stats.fisher_exact(data)
  10. pvalue.append(p_value)
  11. df["p_value"] = np.array(pvalue)

但是,对于更大的数组,这需要一些时间。有没有办法使用numpy向量操作来运行这个测试?在这里使用的scipy.stats.fisher_exact()是否有更好的函数?

英文:

I have a pandas dataframe with four columns:

  1. import pandas as pd
  2. import numpy as np
  3. arr = {
  4. "A" :[75.3,85.2,90.1,80.4,70.9],
  5. "B" :[80.6,90.2,85.7,70.8,95.1],
  6. "C" :[85.2,95.6,80.8,90.4,75.1],
  7. "D" :[90.3,85.5,80.0,70.5,95.2]
  8. }
  9. df = pd.DataFrame(arr)

I want to run the 2x2 fisher exact test on every row and save the p value on a new column. In other words, I want to run the 2x2 fisher exact test on the array below

pvalue = [[df["A"][i], df["B"][i]], [df["C"][i], df["D"][i]]]

for all i between 0 and 5, and save the list as another column in the dataframe.

There is a function in "scipy.stats" called "fisher_exact()" which will output the odd value and p value of the 2x2 array. I have the following code right now which works:

  1. import scipy.stats as stats
  2. pvalue = []
  3. for i in df.index.values:
  4. a = df["A"][i]
  5. b = df["B"][i]
  6. c = df["C"][i]
  7. d = df["D"][i]
  8. data = [[a,b],[c,d]]
  9. odd_value, p_value = stats.fisher_exact(data)
  10. pvalue.append(p_value)
  11. df["p_value"] = np.array(pvalue)

However, for larger arrays, this takes a while. Is there a way to run this test using numpy vector operations? Is there a better function than scipy.stats.fisher_exact() to use here?

答案1

得分: 1

  1. from scipy.stats import fisher_exact
  2. import numpy as np
  3. # 使用 fisher_exact 函数计算 p 值,并将结果存储在 NumPy 数组中
  4. np.array([fisher_exact(i)[1] for i in df.values.reshape(-1, 2, 2)])
  5. # 结果如下:
  6. array([1. , 0.45814062, 0.82723738, 0.6484196 , 0.82550328])
  1. # 你也可以运行以下代码来获得相同的结果:
  2. np.apply_along_axis(lambda x: fisher_exact(x.reshape(-1, 2)), 1, df.values)
  3. # 结果如下,第二列即 `[:, 1]` 是 p 值,第一列是比值比(odds ratios):
  4. array([[0.99264706, 1. ],
  5. [0.84502924, 0.45814062],
  6. [1.05882353, 0.82723738],
  7. [0.88888889, 0.6484196 ],
  8. [0.93333333, 0.82550328]])
英文:
  1. from scipy.stats import fisher_exact
  2. import numpy as np
  3. np.array([fisher_exact(i)[1] for i in df.values.reshape(-1,2,2)])
  4. array([1. , 0.45814062, 0.82723738, 0.6484196 , 0.82550328])

You could also run:

  1. np.apply_along_axis(lambda x:fisher_exact(x.reshape(-1,2)), 1, df.values)
  2. array([[0.99264706, 1. ],
  3. [0.84502924, 0.45814062],
  4. [1.05882353, 0.82723738],
  5. [0.88888889, 0.6484196 ],
  6. [0.93333333, 0.82550328]])

Where the second column ie [:, 1] is the p_values and the first column is the odd ratios

答案2

得分: 0

另一个选择是使用 fast-fisher 库。

当运行 fast_fisher_exact_compatibility 而不是 fisher_exact 时,我们可以得到以下结果:

  1. import pandas as pd
  2. import numpy as np
  3. from fast_fisher import fast_fisher_exact_compatibility
  4. >>> arr = {
  5. >>> "A" :[75.3,85.2,90.1,80.4,70.9],
  6. >>> "B" :[80.6,90.2,85.7,70.8,95.1],
  7. >>> "C" :[85.2,95.6,80.8,90.4,75.1],
  8. >>> "D" :[90.3,85.5,80.0,70.5,95.2]
  9. >>> }
  10. >>> df = pd.DataFrame(arr)
  11. >>> for i in df.index.values:
  12. >>> a = df["A"][i]
  13. >>> b = df["B"][i]
  14. >>> c = df["C"][i]
  15. >>> d = df["D"][i]
  16. >>> data = [[a,b],[c,d]]
  17. >>> odds_f, pval_f = fast_fisher_exact_compatibility(data)
  18. >>> pval_f
  19. 0.825503275434932

你可以将此作为起点,看看在运行更大的数组时性能是否会提升。

英文:

Another option that you could use is the fast-fisher library.

When running fast_fisher_exact_compatibility instead of fisher_exact, we can yield the following:

  1. import pandas as pd
  2. import numpy as np
  3. from fast_fisher import fast_fisher_exact_compatibility
  4. >>> arr = {
  5. >>> "A" :[75.3,85.2,90.1,80.4,70.9],
  6. >>> "B" :[80.6,90.2,85.7,70.8,95.1],
  7. >>> "C" :[85.2,95.6,80.8,90.4,75.1],
  8. >>> "D" :[90.3,85.5,80.0,70.5,95.2]
  9. >>> }
  10. >>> df = pd.DataFrame(arr)
  11. >>> for i in df.index.values:
  12. >>> a = df["A"][i]
  13. >>> b = df["B"][i]
  14. >>> c = df["C"][i]
  15. >>> d = df["D"][i]
  16. >>> data = [[a,b],[c,d]]
  17. >>> odds_f, pval_f = fast_fisher_exact_compatibility(data)
  18. >>> pval_f
  19. 0.825503275434932

You could use this as a starting point and see whether performance improves when running on larger arrays.

huangapple
  • 本文由 发表于 2023年7月7日 01:06:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631106.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定