英文:
Using np.where() + .iloc[] gets wrong result. Reason might be iloc[] no longer supports tuple format of index
问题
I wrote these codes where np.where() and .iloc[] is used to change NaNs to be "missing" in the dataframe. However, the result is wrong and the reason might be that .iloc[] recognized tuple index unsmartly.
Could anybody offer me some guidance to fix it or alternative methods aside from fillna()? Because the real case I want to solve is get indexes by applying conditions on df1 and use the indexes to change values in df2, so fillna() might be unhelpful in my case.
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 NaN
2 Bob NaN 70000.0
3 Jane 28.0 NaN
4 Mark 35.0 90000.0
# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna()) # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace # Use np.where for replacement
df
Name Age Salary
0 John 25.0 50000.0
1 Alice missing missing
2 Bob missing missing
3 Jane missing missing
4 Mark 35.0 90000.0
The expected result should be:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
英文:
I wrote these codes where np.where() and .iloc[] is used to change NaNs to be "missing" in the dataframe. However, the result is wrong and the reason might be that .iloc[] recognized tuple index unsmartly.
Could anybody offer me some guidance to fix it or alternative methods aside from fillna()? Because the real case I want to solve is get indexes by applying conditions on df1 and use the indexes to change values in df2, so fillna() might be unhelpful in my case.
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 NaN
2 Bob NaN 70000.0
3 Jane 28.0 NaN
4 Mark 35.0 90000.0
# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna()) # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace # Use np.where for replacement
df
Name Age Salary
0 John 25.0 50000.0
1 Alice missing missing
2 Bob missing missing
3 Jane missing missing
4 Mark 35.0 90000.0
The expected result should be:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
答案1
得分: 2
我不确定为什么要这样做,因为这会将你的float值更改为object类型,但是可以这样做。我会使用DataFrame.where方法进行替换。这会替换条件求值为False的值。由于我们想要替换NaN值,我们可以使用isna来找到NaN,并应用逻辑NOT使其在值为NaN的地方变为False。另外,@Timus 指出你可以使用notna,它与NOTed isna 做的事情相同。
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)
输出:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
Name object
Age object
Salary object
dtype: object
英文:
I'm not sure why you would do this because you will change your float values to the object type, but it can be done. I would do the replacement using the DataFrame.where method. This replaces values where a condition evaluates to False. Since we want to replace values that are NaN, we can use isna to find the NaNs and apply the logical NOT to make it False where the value is NaN. Alternatively, @Timus pointed out that you can use notna, which does the same thing as the NOTed isna.
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)
Output:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
Name object
Age object
Salary object
dtype: object
答案2
得分: -1
以下是翻译好的部分:
我尝试了这个,对我来说运行得很好:
```import pandas as pd
import numpy as np
# 创建DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
# 指定用于替换缺失值的值
value_to_replace = "missing"
# 找到缺失值的索引
missing_mask = np.where(df.isna())
# 解压索引的元组并替换缺失值
df.iloc[*missing_mask] = value_to_replace
# 打印结果DataFrame
print(df)
<details>
<summary>英文:</summary>
I tried this, and works fine for me :
```import pandas as pd
import numpy as np
# Create the DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
# Specify the value to replace NAs
value_to_replace = "missing"
# Find the indices where values are missing
missing_mask = np.where(df.isna())
# Unpack the tuple of indices and replace missing values
df.iloc[*missing_mask] = value_to_replace
# Print the resulting DataFrame
print(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论