英文:
Using np.where() + .iloc[] gets wrong result. Reason might be iloc[] no longer supports tuple format of index
问题
I wrote these codes where np.where()
and .iloc[]
is used to change NaN
s to be "missing"
in the dataframe. However, the result is wrong and the reason might be that .iloc[]
recognized tuple index unsmartly.
Could anybody offer me some guidance to fix it or alternative methods aside from fillna()
? Because the real case I want to solve is get indexes by applying conditions on df1
and use the indexes to change values in df2
, so fillna()
might be unhelpful in my case.
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 NaN
2 Bob NaN 70000.0
3 Jane 28.0 NaN
4 Mark 35.0 90000.0
# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna()) # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace # Use np.where for replacement
df
Name Age Salary
0 John 25.0 50000.0
1 Alice missing missing
2 Bob missing missing
3 Jane missing missing
4 Mark 35.0 90000.0
The expected result should be:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
英文:
I wrote these codes where np.where()
and .iloc[]
is used to change NaN
s to be "missing"
in the dataframe. However, the result is wrong and the reason might be that .iloc[]
recognized tuple index unsmartly.
Could anybody offer me some guidance to fix it or alternative methods aside from fillna()
? Because the real case I want to solve is get indexes by applying conditions on df1
and use the indexes to change values in df2
, so fillna()
might be unhelpful in my case.
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 NaN
2 Bob NaN 70000.0
3 Jane 28.0 NaN
4 Mark 35.0 90000.0
# Use np.where and .iloc to replace missing values with a specified value
value_to_replace = "missing"
missing_mask = np.where(df.isna()) # Create a boolean mask for missing values
df.iloc[missing_mask]= value_to_replace # Use np.where for replacement
df
Name Age Salary
0 John 25.0 50000.0
1 Alice missing missing
2 Bob missing missing
3 Jane missing missing
4 Mark 35.0 90000.0
The expected result should be:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
答案1
得分: 2
我不确定为什么要这样做,因为这会将你的float
值更改为object
类型,但是可以这样做。我会使用DataFrame.where
方法进行替换。这会替换条件求值为False
的值。由于我们想要替换NaN
值,我们可以使用isna
来找到NaN
,并应用逻辑NOT使其在值为NaN
的地方变为False
。另外,@Timus 指出你可以使用notna
,它与NOTed isna
做的事情相同。
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)
输出:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
Name object
Age object
Salary object
dtype: object
英文:
I'm not sure why you would do this because you will change your float
values to the object
type, but it can be done. I would do the replacement using the DataFrame.where
method. This replaces values where a condition evaluates to False
. Since we want to replace values that are NaN
, we can use isna
to find the NaN
s and apply the logical NOT to make it False
where the value is NaN
. Alternatively, @Timus pointed out that you can use notna
, which does the same thing as the NOTed isna
.
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
df = df.where(~df.isna(), "missing")
print(df)
print(df.dtypes)
Output:
Name Age Salary
0 John 25.0 50000.0
1 Alice 31.0 missing
2 Bob missing 70000.0
3 Jane 28.0 missing
4 Mark 35.0 90000.0
Name object
Age object
Salary object
dtype: object
答案2
得分: -1
以下是翻译好的部分:
我尝试了这个,对我来说运行得很好:
```import pandas as pd
import numpy as np
# 创建DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
# 指定用于替换缺失值的值
value_to_replace = "missing"
# 找到缺失值的索引
missing_mask = np.where(df.isna())
# 解压索引的元组并替换缺失值
df.iloc[*missing_mask] = value_to_replace
# 打印结果DataFrame
print(df)
<details>
<summary>英文:</summary>
I tried this, and works fine for me :
```import pandas as pd
import numpy as np
# Create the DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Jane', 'Mark'],
'Age': [25, 31, np.nan, 28, 35],
'Salary': [50000, np.nan, 70000, np.nan, 90000]}
df = pd.DataFrame(data)
# Specify the value to replace NAs
value_to_replace = "missing"
# Find the indices where values are missing
missing_mask = np.where(df.isna())
# Unpack the tuple of indices and replace missing values
df.iloc[*missing_mask] = value_to_replace
# Print the resulting DataFrame
print(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论