2023年7月7日 01:06:59go评论104阅读模式

英文:

Python Pandas Fisher Exact Test 2x2

问题

以下是您要翻译的内容：

import pandas as pd
import numpy as np
arr = {
       "A" :[75.3,85.2,90.1,80.4,70.9],
       "B" :[80.6,90.2,85.7,70.8,95.1],
       "C" :[85.2,95.6,80.8,90.4,75.1],
       "D" :[90.3,85.5,80.0,70.5,95.2]
              }
df = pd.DataFrame(arr)

我想在每一行上运行2x2费舍尔精确检验，并将p值保存在新列中。换句话说，我想对以下数组运行2x2费舍尔精确检验：

pvalue = [[df["A"][i], df["B"][i]], [df["C"][i], df["D"][i]]]

对于所有i在0到5之间，并将列表保存为数据框的另一列。

在"scipy.stats"中有一个名为"fisher_exact()"的函数，它将输出2x2数组的奇数值和p值。我目前有以下代码，它可以正常工作：

import scipy.stats as stats
pvalue = []
for i in df.index.values:
    a = df["A"][i]
    b = df["B"][i]
    c = df["C"][i]
    d = df["D"][i]
    
    data = [[a,b],[c,d]]
    
    odd_value, p_value = stats.fisher_exact(data)
    pvalue.append(p_value)
    
df["p_value"] = np.array(pvalue)

但是，对于更大的数组，这需要一些时间。有没有办法使用numpy向量操作来运行这个测试？在这里使用的scipy.stats.fisher_exact()是否有更好的函数？

英文:

I have a pandas dataframe with four columns:

import pandas as pd
import numpy as np
arr = {
       &quot;A&quot; :[75.3,85.2,90.1,80.4,70.9],
       &quot;B&quot; :[80.6,90.2,85.7,70.8,95.1],
       &quot;C&quot; :[85.2,95.6,80.8,90.4,75.1],
       &quot;D&quot; :[90.3,85.5,80.0,70.5,95.2]
              }
df = pd.DataFrame(arr)

I want to run the 2x2 fisher exact test on every row and save the p value on a new column. In other words, I want to run the 2x2 fisher exact test on the array below

pvalue = [[df["A"][i], df["B"][i]], [df["C"][i], df["D"][i]]]

for all i between 0 and 5, and save the list as another column in the dataframe.

There is a function in "scipy.stats" called "fisher_exact()" which will output the odd value and p value of the 2x2 array. I have the following code right now which works:

import scipy.stats as stats
pvalue = []
for i in df.index.values:
    a = df[&quot;A&quot;][i]
    b = df[&quot;B&quot;][i]
    c = df[&quot;C&quot;][i]
    d = df[&quot;D&quot;][i]
    
    data = [[a,b],[c,d]]
    
    odd_value, p_value = stats.fisher_exact(data)
    pvalue.append(p_value)
    
df[&quot;p_value&quot;] = np.array(pvalue)

However, for larger arrays, this takes a while. Is there a way to run this test using numpy vector operations? Is there a better function than scipy.stats.fisher_exact() to use here?

答案1

得分: 1

from scipy.stats import fisher_exact
import numpy as np
# 使用 fisher_exact 函数计算 p 值，并将结果存储在 NumPy 数组中
np.array([fisher_exact(i)[1] for i in df.values.reshape(-1, 2, 2)])
# 结果如下：
array([1.        , 0.45814062, 0.82723738, 0.6484196 , 0.82550328])

# 你也可以运行以下代码来获得相同的结果：
np.apply_along_axis(lambda x: fisher_exact(x.reshape(-1, 2)), 1, df.values)
# 结果如下，第二列即 `[:, 1]` 是 p 值，第一列是比值比（odds ratios）：
array([[0.99264706, 1.        ],
       [0.84502924, 0.45814062],
       [1.05882353, 0.82723738],
       [0.88888889, 0.6484196 ],
       [0.93333333, 0.82550328]])

英文:

from scipy.stats import fisher_exact
import numpy as np
np.array([fisher_exact(i)[1] for i in df.values.reshape(-1,2,2)])
array([1.        , 0.45814062, 0.82723738, 0.6484196 , 0.82550328])

You could also run:

np.apply_along_axis(lambda x:fisher_exact(x.reshape(-1,2)), 1, df.values) 
array([[0.99264706, 1.        ],
       [0.84502924, 0.45814062],
       [1.05882353, 0.82723738],
       [0.88888889, 0.6484196 ],
       [0.93333333, 0.82550328]])

Where the second column ie [:, 1] is the p_values and the first column is the odd ratios

答案2

得分: 0

另一个选择是使用 fast-fisher 库。

当运行 fast_fisher_exact_compatibility 而不是 fisher_exact 时，我们可以得到以下结果：

import pandas as pd
import numpy as np
from fast_fisher import fast_fisher_exact_compatibility
>>> arr = {
>>>        "A" :[75.3,85.2,90.1,80.4,70.9],
>>>        "B" :[80.6,90.2,85.7,70.8,95.1],
>>>        "C" :[85.2,95.6,80.8,90.4,75.1],
>>>        "D" :[90.3,85.5,80.0,70.5,95.2]
>>>               }
>>> df = pd.DataFrame(arr)
>>> for i in df.index.values:
>>>     a = df["A"][i]
>>>     b = df["B"][i]
>>>     c = df["C"][i]
>>>     d = df["D"][i]
>>>     data = [[a,b],[c,d]]
>>>     odds_f, pval_f = fast_fisher_exact_compatibility(data)
>>> pval_f
0.825503275434932

你可以将此作为起点，看看在运行更大的数组时性能是否会提升。

英文:

Another option that you could use is the fast-fisher library.

When running fast_fisher_exact_compatibility instead of fisher_exact, we can yield the following:

import pandas as pd
import numpy as np
from fast_fisher import fast_fisher_exact_compatibility
&gt;&gt;&gt; arr = {
&gt;&gt;&gt;        &quot;A&quot; :[75.3,85.2,90.1,80.4,70.9],
&gt;&gt;&gt;        &quot;B&quot; :[80.6,90.2,85.7,70.8,95.1],
&gt;&gt;&gt;        &quot;C&quot; :[85.2,95.6,80.8,90.4,75.1],
&gt;&gt;&gt;        &quot;D&quot; :[90.3,85.5,80.0,70.5,95.2]
&gt;&gt;&gt;               }
&gt;&gt;&gt; df = pd.DataFrame(arr)
&gt;&gt;&gt; for i in df.index.values:
&gt;&gt;&gt;     a = df[&quot;A&quot;][i]
&gt;&gt;&gt;     b = df[&quot;B&quot;][i]
&gt;&gt;&gt;     c = df[&quot;C&quot;][i]
&gt;&gt;&gt;     d = df[&quot;D&quot;][i]
    
&gt;&gt;&gt;     data = [[a,b],[c,d]]
&gt;&gt;&gt;     odds_f, pval_f = fast_fisher_exact_compatibility(data)
&gt;&gt;&gt; pval_f
0.825503275434932

You could use this as a starting point and see whether performance improves when running on larger arrays.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Pandas Fisher Exact Test 2×2 Python Pandas Fisher精确检验2×2

问题

答案1

答案2

计算网格的顶点距离

将pandas数据框转换为多个列。

Is it possible to specify a foreign key for a table field in Django models.py to any table from the database?

我想要一个Plotly下拉菜单来根据列的唯一值来筛选数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。