Using np.where() + .iloc[] gets wrong result. Reason might be iloc[] no longer supports tuple format of index

huangapple go评论77阅读模式
英文:

Using np.where() + .iloc[] gets wrong result. Reason might be iloc[] no longer supports tuple format of index

问题

I wrote these codes where np.where() and .iloc[] is used to change NaNs to be "missing" in the dataframe. However, the result is wrong and the reason might be that .iloc[] recognized tuple index unsmartly.

Could anybody offer me some guidance to fix it or alternative methods aside from fillna()? Because the real case I want to solve is get indexes by applying conditions on df1 and use the indexes to change values in df2, so fillna() might be unhelpful in my case.

data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
    'Age': [25, 31, np.nan, 28, 35],
    'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df
    Name   Age   Salary
0   John  25.0  50000.0
1  Alice  31.0      NaN
2    Bob   NaN  70000.0
3   Jane  28.0      NaN
4   Mark  35.0  90000.0
# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna())  # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace  # Use np.where for replacement
df
    Name      Age   Salary
0   John     25.0  50000.0
1  Alice  missing  missing
2    Bob  missing  missing
3   Jane  missing  missing
4   Mark     35.0  90000.0

The expected result should be:

    Name      Age   Salary  
0   John     25.0  50000.0  
1  Alice     31.0  missing  
2    Bob  missing  70000.0 
3   Jane     28.0  missing  
4   Mark     35.0  90000.0
英文:

I wrote these codes where np.where() and .iloc[] is used to change NaNs to be "missing" in the dataframe. However, the result is wrong and the reason might be that .iloc[] recognized tuple index unsmartly.

Could anybody offer me some guidance to fix it or alternative methods aside from fillna()? Because the real case I want to solve is get indexes by applying conditions on df1 and use the indexes to change values in df2, so fillna() might be unhelpful in my case.

data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
    'Age': [25, 31, np.nan, 28, 35],
    'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df
    Name   Age   Salary
0   John  25.0  50000.0
1  Alice  31.0      NaN
2    Bob   NaN  70000.0
3   Jane  28.0      NaN
4   Mark  35.0  90000.0
# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna())  # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace  # Use np.where for replacement
df
    Name      Age   Salary
0   John     25.0  50000.0
1  Alice  missing  missing
2    Bob  missing  missing
3   Jane  missing  missing
4   Mark     35.0  90000.0

The expected result should be:

    Name      Age   Salary  
0   John     25.0  50000.0  
1  Alice     31.0  missing  
2    Bob  missing  70000.0 
3   Jane     28.0  missing  
4   Mark     35.0  90000.0

答案1

得分: 2

我不确定为什么要这样做,因为这会将你的float值更改为object类型,但是可以这样做。我会使用DataFrame.where方法进行替换。这会替换条件求值为False的值。由于我们想要替换NaN值,我们可以使用isna来找到NaN,并应用逻辑NOT使其在值为NaN的地方变为False。另外,@Timus 指出你可以使用notna,它与NOTed isna 做的事情相同。

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
    'Age': [25, 31, np.nan, 28, 35],
    'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)

输出:

    Name      Age   Salary
0   John     25.0  50000.0
1  Alice     31.0  missing
2    Bob  missing  70000.0
3   Jane     28.0  missing
4   Mark     35.0  90000.0
Name      object
Age       object
Salary    object
dtype: object
英文:

I'm not sure why you would do this because you will change your float values to the object type, but it can be done. I would do the replacement using the DataFrame.where method. This replaces values where a condition evaluates to False. Since we want to replace values that are NaN, we can use isna to find the NaNs and apply the logical NOT to make it False where the value is NaN. Alternatively, @Timus pointed out that you can use notna, which does the same thing as the NOTed isna.

import pandas as pd
import numpy as np

data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
    'Age': [25, 31, np.nan, 28, 35],
    'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)

Output:

    Name      Age   Salary
0   John     25.0  50000.0
1  Alice     31.0  missing
2    Bob  missing  70000.0
3   Jane     28.0  missing
4   Mark     35.0  90000.0
Name      object
Age       object
Salary    object
dtype: object

答案2

得分: -1

以下是翻译好的部分:

我尝试了这个对我来说运行得很好
```import pandas as pd
import numpy as np

# 创建DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
        'Age': [25, 31, np.nan, 28, 35],
        'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)

# 指定用于替换缺失值的值
value_to_replace = "missing"

# 找到缺失值的索引
missing_mask = np.where(df.isna())

# 解压索引的元组并替换缺失值
df.iloc[*missing_mask] = value_to_replace

# 打印结果DataFrame
print(df)

<details>
<summary>英文:</summary>

I tried this, and works fine for me :
```import pandas as pd
import numpy as np

# Create the DataFrame
data = {&#39;Name&#39;: [&#39;John&#39;, &#39;Alice&#39;, &#39;Bob&#39;, &#39;Jane&#39;, &#39;Mark&#39;],
        &#39;Age&#39;: [25, 31, np.nan, 28, 35],
        &#39;Salary&#39;: [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)

# Specify the value to replace NAs
value_to_replace = &quot;missing&quot;

# Find the indices where values are missing
missing_mask = np.where(df.isna())

# Unpack the tuple of indices and replace missing values
df.iloc[*missing_mask] = value_to_replace

# Print the resulting DataFrame
print(df)

huangapple
  • 本文由 发表于 2023年6月29日 23:05:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582311.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定