问题

I'm trying to run the f_oneway function in scipy.

Basically, I have 3 dataframes representing respectively 3 groups, and I want to perform ANOVA among axis=1.

from scipy.stats import f_oneway
import pandas as pd
import numpy as np
group1 = {&#39;1&#39;: {0: 574145.477641226, 1: 1570531.589742876, 2: 787929.7027375237, 3: 2570860.248729332, 4: 161008.90274193016, 5: np.nan, 6: 1027027.5447738492, 7: 10620164.126712576, 8: 3030551.86415567, 9: 6080226.794887304}, &#39;2&#39;: {0: 5590292.274747584, 1: 2015192.4244239724, 2: 1442638.778579319, 3: 9484756.854645137, 4: 231213.53284854395, 5: 1576095.5497571388, 6: 853517.4230997175, 7: 13701076.997994969, 8: 880909.9414626792,9: 10973682.322579961}, &#39;3&#39;: {0: 1786259.070812378, 1: 1188813.4685229606, 2: 280628.96027922264, 3: 2752454.6157454816, 4: 142423.39853381264, 5: 408643.1442709076, 6: 978859.742220046, 7: 8581569.49299859, 8: 2810091.19540494, 9: 3250847.2113601067}, &#39;4&#39;: {0: 1423158.826220004, 1: np.nan, 2: 659142.6504867233, 3: 2727740.4095105752, 4: np.nan, 5: np.nan, 6: 166867.88656477776, 7: 15578367.076207979, 8: 1262229.6767083204, 9: 7537134.164088669}}
    
group2 = {&#39;1&#39;: {0: 1108031.2785915325, 1: 39475.12143335618, 2: 124744.55052420696, 3: 3415955.3418994714, 4: np.nan, 5: np.nan, 6: 1185929.1264358065, 7: 14219856.696859175, 8: 107938.85576451271, 9: 9075885.57144011}, &#39;2&#39;: {0: 3711668.7595074927, 1: np.nan, 2: 92069.12449997541, 3: 1430920.365911842, 4: 23305.980330372353, 5: 146884.88381736717, 6: 143162.52169470832, 7: 11043912.321755221,8: 1507299.549731886, 9: 6675740.20722453}, &#39;3&#39;: {0: np.nan, 1: np.nan, 2: np.nan, 3: 192966.31343644214, 4: np.nan, 5: np.nan, 6: np.nan, 7: 13478434.128944362, 8: np.nan, 9: np.nan}, &#39;4&#39;: {0: 6446934.0065947445, 1: 3195385.066201132, 2: 3332326.9653299027, 3: 7082529.01041953, 4: 139891.94206563127, 5: 208662.14176584402, 6: 2559284.7669506934, 7: 7395774.107780765, 8: 415796.834504837, 9: 9502289.070542539}}
group3 = {&#39;1&#39;: {0: 5832002.081448822, 1: 2607987.6485992945, 2: 2465656.4470221293, 3: 6077038.510021252, 4: 391523.2907555177, 5: np.nan, 6: 2590061.00923242, 7: 7982067.848957288, 8: 61836.18519446156, 9: 10673885.385156194}, &#39;2&#39;: {0: 4515593.798793708, 1: 2070893.600738691, 2: 1788619.7598766778, 3: 7302148.61285157, 4: 132247.07494014164, 5: 2130531.009443398, 6: 849079.4122880008, 7: 11086507.936560597, 8: np.nan, 9: 8977041.57285477}, &#39;3&#39;: {0: 6916739.909404968, 1: 2886026.824106484, 2: 871822.3682870092, 3: 6515743.347648245, 4: 347767.01169986156, 5: 2975827.5336636542, 6: 3270053.676901515, 7: 9230036.81889698, 8: 4753111.521553177, 9: 11835765.28309747}, &#39;4&#39;: {0: 8918243.297089897, 1: 2631751.3775385492, 2: 2294251.0955892503, 3: 7540353.19469351, 4: 48925.64795198818, 5: 447721.0646689915, 6: 1682494.645617865, 7: 6945276.49780706, 8: 978022.2657575278, 9: 11631856.25162219}}
groups = [group1, group2, group
<details>
<summary>英文:</summary>
I&#39;m trying to run the f_oneway function in scipy.
Basically, I have 3 dataframes representing respectively 3 groups and I want to perform ANOVA  among axis=1.
    from scipy.stats import f_oneway
    import pandas as pd
    import numpy as np
    group1 = {&#39;1&#39;: {0: 574145.477641226, 1: 1570531.589742876, 2: 787929.7027375237, 3: 2570860.248729332, 4: 161008.90274193016, 5: np.nan, 6: 1027027.5447738492, 7: 10620164.126712576, 8: 3030551.86415567, 9: 6080226.794887304}, &#39;2&#39;: {0: 5590292.274747584, 1: 2015192.4244239724, 2: 1442638.778579319, 3: 9484756.854645137, 4: 231213.53284854395, 5: 1576095.5497571388, 6: 853517.4230997175, 7: 13701076.997994969, 8: 880909.9414626792,9: 10973682.322579961}, &#39;3&#39;: {0: 1786259.070812378, 1: 1188813.4685229606, 2: 280628.96027922264, 3: 2752454.6157454816, 4: 142423.39853381264, 5: 408643.1442709076, 6: 978859.742220046, 7: 8581569.49299859, 8: 2810091.19540494, 9: 3250847.2113601067}, &#39;4&#39;: {0: 1423158.826220004, 1: np.nan, 2: 659142.6504867233, 3: 2727740.4095105752, 4: np.nan, 5: np.nan, 6: 166867.88656477776, 7: 15578367.076207979, 8: 1262229.6767083204, 9: 7537134.164088669}}
    
    group2 = {&#39;1&#39;: {0: 1108031.2785915325, 1: 39475.12143335618, 2: 124744.55052420696, 3: 3415955.3418994714, 4: np.nan, 5: np.nan, 6: 1185929.1264358065, 7: 14219856.696859175, 8: 107938.85576451271, 9: 9075885.57144011}, &#39;2&#39;: {0: 3711668.7595074927, 1: np.nan, 2: 92069.12449997541, 3: 1430920.365911842, 4: 23305.980330372353, 5: 146884.88381736717, 6: 143162.52169470832, 7: 11043912.321755221,8: 1507299.549731886, 9: 6675740.20722453}, &#39;3&#39;: {0: np.nan, 1: np.nan, 2: np.nan, 3: 192966.31343644214, 4: np.nan, 5: np.nan, 6: np.nan, 7: 13478434.128944362, 8: np.nan, 9: np.nan}, &#39;4&#39;: {0: 6446934.0065947445, 1: 3195385.066201132, 2: 3332326.9653299027, 3: 7082529.01041953, 4: 139891.94206563127, 5: 208662.14176584402, 6: 2559284.7669506934, 7: 7395774.107780765, 8: 415796.834504837, 9: 9502289.070542539}}
    group3 = {&#39;1&#39;: {0: 5832002.081448822, 1: 2607987.6485992945, 2: 2465656.4470221293, 3: 6077038.510021252, 4: 391523.2907555177, 5: np.nan, 6: 2590061.00923242, 7: 7982067.848957288, 8: 61836.18519446156, 9: 10673885.385156194}, &#39;2&#39;: {0: 4515593.798793708, 1: 2070893.600738691, 2: 1788619.7598766778, 3: 7302148.61285157, 4: 132247.07494014164, 5: 2130531.009443398, 6: 849079.4122880008, 7: 11086507.936560597, 8: np.nan, 9: 8977041.57285477}, &#39;3&#39;: {0: 6916739.909404968, 1: 2886026.824106484, 2: 871822.3682870092, 3: 6515743.347648245, 4: 347767.01169986156, 5: 2975827.5336636542, 6: 3270053.676901515, 7: 9230036.81889698, 8: 4753111.521553177, 9: 11835765.28309747}, &#39;4&#39;: {0: 8918243.297089897, 1: 2631751.3775385492, 2: 2294251.0955892503, 3: 7540353.19469351, 4: 48925.64795198818, 5: 447721.0646689915, 6: 1682494.645617865, 7: 6945276.49780706, 8: 978022.2657575278, 9: 11631856.25162219}}
    groups = [group1, group2, group3]
    data = [pd.DataFrame(x) for x in groups]
    result = f_oneway(*data, axis=1)
    result
The result output is: 
&gt; pvalue=array([nan, nan, nan, 0.17318404, nan, nan, nan, 0.24112312, nan, nan])
The nan p-value is probably due to the NaN also present in my datasets, requiring an analysis omitting the NaN. So, I tried: 
    groups = [group1, group2, group3]
    data = [pd.DataFrame(x) for x in groups]
    data_test = []
    for i in data:
         df = i.to_numpy()
         df = [x[~np.isnan(x)] for x in df]
         data_test.append(df)
    from scipy.stats import f_oneway
    result = f_oneway(*data_test, axis=1)
    result
And the output was:
&gt; ValueError: setting an array element with a sequence. The requested
&gt; array has an inhomogeneous shape after 1 dimensions. The detected
&gt; shape was (10,) + inhomogeneous part.
Someone knows how can I perform an ANOVA with same performance than scipy but ommiting NaN from original samples?
</details>
# 答案1
**得分**: 1
以下是翻译好的部分：
在每个样本中，您删除了每列不均匀数量的 np.NaN 值，这意味着每个样本最终都变成了一个不规则的（不均匀的）数组。通常，在具有表格类型结构的缺失值中，您必须执行以下操作之一：
**删除整个列** - 这通常不太有帮助，但取决于研究的主题。
**删除整个行** - 通常不太激进，但您可能需要检查剩下多少数据。
**填补数据** - 使用默认值、最小值、滚动平均值或其他猜测值添加的某种方法进行填充。
Pandas 有许多用于处理缺失值的函数，还有[许多在线指南](https://www.geeksforgeeks.org/working-with-missing-data-in-pandas/)。在这里，我们删除所有具有缺失值的行。
```python
groups = [group1, group2, group3]
data = [pd.DataFrame(x).dropna(axis='index') for x in groups]
from scipy.stats import f_oneway
result = f_oneway(*data, axis=1)
result

英文:

Within each sample, you are deleting an uneven amount of np.NaN values per column, meaning that each sample ends up as a ragged (inhomogenuous) array. usually with missing values in table type structures you must either

Drop the whole column - this is often not helpful but depends on the subject studied.

Drop the whole row - often less drastic, but you may need to check how much data you have left.

Impute data - fill with a default or minimum value or rolling average or some other method of adding guessed values.

Pandas has many functions for handling missing values and there are many online guides. Here we drop all rows with a missing value

groups = [group1, group2, group3]
data = [pd.DataFrame(x).dropna(axis=&#39;index&#39;) for x in groups]
from scipy.stats import f_oneway
result = f_oneway(*data, axis=1)
result

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Anova in scipy with nan output.

问题

Running a script after creation of instance in GCP Managed Instance Group

Trying to use a variable as an index but it won’t work.

如何使用Python BigQuery客户端更新BigQuery分区过期时间？

Numpy. 如何按照网格将2D数组拆分为多个数组？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。