2023年2月18日 12:01:43go评论82阅读模式

英文:

pd.DataFrame how to calculate mean() while ignore 'NA' string in some cell

问题

我有一个数据框，我想计算列A和B的平均值，A和B中的某些行是字符串'NA'，而其他行是numpy.float64类型。如何计算列A和B的均值，同时忽略那些'NA'？

我尝试将numeric_only设置为True，然后只返回列C和id。我期望A和B的均值分别为6.6和2.6。

fruit = pd.DataFrame({'id':(1,2,3,4,5,6),'Name':('apple','apple','melon','melon','orange','orange'), 'A': (1,2,'NA',20,5,5), 'B': (1,5,4,2,'NA',1) , 'C': (1,5,4,2,3,1)})

希望这对你有所帮助。

英文:

I have a dataframe that I want to calculate column mean of A and B,
some rows in A and B are of string 'NA', and others are of numpy.float64
how to calculate the column A and B while ignoring those 'NA'?
I tried to set numeric_only=True then it only return me column C and id.
i'm expecting A and B mean to be 6.6 and 2.6

fruit = pd.DataFrame({'id':(1,2,3,4,5,6),'Name':('apple','apple','melon','melon','orange','orange'), 'A': (1,2,'NA',20,5,5), 'B': (1,5,4,2,'NA',1) , 'C': (1,5,4,2,3,1)})

id	Name	A	B	C
1	apple	1	1	1
2	apple	2	5	5
3	melon	'NA'	4	4
4	melon	20	2	2
5	orange	5	'NA'	3
6	orange	5	1	1

答案1

得分: 1

尝试这个：

import numpy as np

fruit.replace('NA', np.nan, inplace=True)
fruit['A'].mean()

英文:

Try this:

import numpy as np

fruit.replace(&#39;NA&#39;, np.nan, inplace=True)
fruit[&#39;A&#39;].mean()

答案2

得分: 1

由于 `mean` 可以跳过 NaN 值，您可以使用 `to_numeric` 并设置 `errors="coerce"`：

&gt; errors{'ignore', 'raise', 'coerce'}, 默认为 'raise'
&gt;
&gt; - 如果为 'raise'，则无效解析将引发异常。
&gt; - **如果为 `coerce`，则无效解析将被设置为 NaN。**
&gt; - 如果为 'ignore'，则无效解析将返回输入。

    fruit[["A", "B"]].apply(pd.to_numeric, errors="coerce").mean()

输出：

    A    6.6
    B    2.6
    dtype: float64

英文:

Since mean can skip NaN values, you can use to_numeric and set errors="coerce" :

> errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
>
> - If ‘raise’, then invalid parsing will raise an exception.
> - If coerce, then invalid parsing will be set as NaN.
> - If ‘ignore’, then invalid parsing will return the input.

fruit[[&quot;A&quot;, &quot;B&quot;]].apply(pd.to_numeric, errors=&quot;coerce&quot;).mean()

Output :

A    6.6
B    2.6
dtype: float64

答案3

得分: 0

fruit['A'].replace('NA', 0).sum() / sum(fruit['A'].ne('NA'))
# 6.6

fruit['B'].replace('NA', 0).sum() / sum(fruit['B'].ne('NA'))
# 2.6

所以：

fruit[['A', 'B']].apply(lambda c: c.replace('NA', 0).sum() / sum(c.ne('NA')))
A    6.6
B    2.6
dtype: float64

英文:

fruit[&#39;A&#39;].replace(&#39;NA&#39;, 0).sum() / sum(fruit[&#39;A&#39;].ne(&#39;NA&#39;))
# 6.6

fruit[&#39;B&#39;].replace(&#39;NA&#39;, 0).sum() / sum(fruit[&#39;B&#39;].ne(&#39;NA&#39;))
# 2.6

So :

fruit[[&#39;A&#39;, &#39;B&#39;]].apply(lambda c:c.replace(&#39;NA&#39;, 0).sum() / sum(c.ne(&#39;NA&#39;)))
A    6.6
B    2.6
dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“`python pd.DataFrame 如何计算 mean()，同时忽略某些单元格中的 ‘NA’ 字符串 “`

问题

答案1

答案2

答案3

如何根据字符在列中出现的次数选择包含字符值的数据框的列？

将州映射到 pandas 数据框中的邮政编码

删除数据框中所有列包含相同内容或为空的行。

Pytest覆盖夹具的参数默认值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论