2023年6月29日 23:05:08go评论77阅读模式

英文:

Using np.where() + .iloc[] gets wrong result. Reason might be iloc[] no longer supports tuple format of index

问题

I wrote these codes where np.where() and .iloc[] is used to change NaNs to be "missing" in the dataframe. However, the result is wrong and the reason might be that .iloc[] recognized tuple index unsmartly.

Could anybody offer me some guidance to fix it or alternative methods aside from fillna()? Because the real case I want to solve is get indexes by applying conditions on df1 and use the indexes to change values in df2, so fillna() might be unhelpful in my case.

data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
    'Age': [25, 31, np.nan, 28, 35],
    'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df

    Name   Age   Salary
0   John  25.0  50000.0
1  Alice  31.0      NaN
2    Bob   NaN  70000.0
3   Jane  28.0      NaN
4   Mark  35.0  90000.0

# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna())  # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace  # Use np.where for replacement
df

    Name      Age   Salary
0   John     25.0  50000.0
1  Alice  missing  missing
2    Bob  missing  missing
3   Jane  missing  missing
4   Mark     35.0  90000.0

The expected result should be:

    Name      Age   Salary  
0   John     25.0  50000.0  
1  Alice     31.0  missing  
2    Bob  missing  70000.0 
3   Jane     28.0  missing  
4   Mark     35.0  90000.0

英文:

data = {&#39;Name&#39;: [&#39;John&#39;, &#39;Alice&#39;, &#39;Bob&#39;, &#39;Jane&#39;, &#39;Mark&#39;],
    &#39;Age&#39;: [25, 31, np.nan, 28, 35],
    &#39;Salary&#39;: [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df

    Name   Age   Salary
0   John  25.0  50000.0
1  Alice  31.0      NaN
2    Bob   NaN  70000.0
3   Jane  28.0      NaN
4   Mark  35.0  90000.0

# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = &quot;missing&quot;
missing_mask = np.where(df.isna())  # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace  # Use np.where for replacement
df

    Name      Age   Salary
0   John     25.0  50000.0
1  Alice  missing  missing
2    Bob  missing  missing
3   Jane  missing  missing
4   Mark     35.0  90000.0

The expected result should be:

    Name      Age   Salary  
0   John     25.0  50000.0  
1  Alice     31.0  missing  
2    Bob  missing  70000.0 
3   Jane     28.0  missing  
4   Mark     35.0  90000.0

答案1

得分: 2

我不确定为什么要这样做，因为这会将你的float值更改为object类型，但是可以这样做。我会使用DataFrame.where方法进行替换。这会替换条件求值为False的值。由于我们想要替换NaN值，我们可以使用isna来找到NaN，并应用逻辑NOT使其在值为NaN的地方变为False。另外，@Timus 指出你可以使用notna，它与NOTed isna 做的事情相同。

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
    'Age': [25, 31, np.nan, 28, 35],
    'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)

输出:

    Name      Age   Salary
0   John     25.0  50000.0
1  Alice     31.0  missing
2    Bob  missing  70000.0
3   Jane     28.0  missing
4   Mark     35.0  90000.0
Name      object
Age       object
Salary    object
dtype: object

英文:

I'm not sure why you would do this because you will change your float values to the object type, but it can be done. I would do the replacement using the DataFrame.where method. This replaces values where a condition evaluates to False. Since we want to replace values that are NaN, we can use isna to find the NaNs and apply the logical NOT to make it False where the value is NaN. Alternatively, @Timus pointed out that you can use notna, which does the same thing as the NOTed isna.

import pandas as pd
import numpy as np

data = {&#39;Name&#39;: [&#39;John&#39;, &#39;Alice&#39;, &#39;Bob&#39;, &#39;Jane&#39;, &#39;Mark&#39;],
    &#39;Age&#39;: [25, 31, np.nan, 28, 35],
    &#39;Salary&#39;: [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), &quot;missing&quot;)
print(df)
print(df.dtypes)

Output:

    Name      Age   Salary
0   John     25.0  50000.0
1  Alice     31.0  missing
2    Bob  missing  70000.0
3   Jane     28.0  missing
4   Mark     35.0  90000.0
Name      object
Age       object
Salary    object
dtype: object

答案2

得分: -1

以下是翻译好的部分：

我尝试了这个，对我来说运行得很好：
```import pandas as pd
import numpy as np

# 创建DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
        'Age': [25, 31, np.nan, 28, 35],
        'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)

# 指定用于替换缺失值的值
value_to_replace = "missing"

# 找到缺失值的索引
missing_mask = np.where(df.isna())

# 解压索引的元组并替换缺失值
df.iloc[*missing_mask] = value_to_replace

# 打印结果DataFrame
print(df)


<details>
<summary>英文:</summary>

I tried this, and works fine for me :
```import pandas as pd
import numpy as np

# Create the DataFrame
data = {&#39;Name&#39;: [&#39;John&#39;, &#39;Alice&#39;, &#39;Bob&#39;, &#39;Jane&#39;, &#39;Mark&#39;],
        &#39;Age&#39;: [25, 31, np.nan, 28, 35],
        &#39;Salary&#39;: [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)

# Specify the value to replace NAs
value_to_replace = &quot;missing&quot;

# Find the indices where values are missing
missing_mask = np.where(df.isna())

# Unpack the tuple of indices and replace missing values
df.iloc[*missing_mask] = value_to_replace

# Print the resulting DataFrame
print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Using np.where() + .iloc[] gets wrong result. Reason might be iloc[] no longer supports tuple format of index

问题

答案1

答案2

禁用使用Lambda的CloudWatch警报操作

如何在Rust中使用PyO3从内部修改自定义rust对象的Python列表？

Python代码在执行时出现错误。

In which one of the "~/pythonX.X/site-packages/" should I put my self-written package when I have anaconda and different environments?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论