英文:
Replace high values with forward filling
问题
我有一个数据框,其中一些特征包含非常高的异常值。我想要去除那些突然出现的非常高的值。
ax.plot(df['Temperature'])
为了减轻这种影响,我使用了clip
函数,根据分位数进行裁剪,但效果不如我希望的好。
ax.plot(df['Temperature'].clip(lower=df['Temperature'].quantile(0.05), upper=df['Temperature'].quantile(0.95)))
我如何用前向填充来替换这些(非常高的)值?如果温度从df('Temperature')[100]
跳到df('Temperature')[120]
,然后用df('Temperature')[99]
替换这些值。
英文:
I have a dataframe of which some features contain very high outliers. I would like to get rid of those sudden very high values
ax.plot(df['Temperature'])
To lessen this effect i used clip
depending on the quantiles, but it does not work as good as i would like.
ax.plot(df['Temperature'].clip(lower=df['Temperature'].quantile(0.05), upper=df['Temperature'].quantile(0.95)))
How can i replace these (very high) values with their previous ones with forward filling? If the Temperature jump at df('Temperature')[100]
until df('Temperature')[120]
then replace these values with df('Temperature')[99]
答案1
得分: 1
也许可以将无效的索引设置为 NaN
,然后使用 fillna
来填充它们?
>>> seq = np.arange(0, 10)
>>> seq[4:7] *= 100
>>> df = pd.DataFrame(seq, columns=['temp'])
temp
0 0
1 1
2 2
3 3
4 400
5 500
6 600
7 7
8 8
9 9
>>> df[df.temp>=300] = np.nan # 根据需要调整条件
temp
0 0.0
1 1.0
2 2.0
3 3.0
4 NaN
5 NaN
6 NaN
7 7.0
8 8.0
9 9.0
>>> df.fillna(method='backfill')
temp
0 0.0
1 1.0
2 2.0
3 3.0
4 7.0
5 7.0
6 7.0
7 7.0
8 8.0
9 9.0
英文:
Maybe NaN
the indices that are invalid, then use fillna
to backfill them?
>>> seq = np.arange(0, 10)
>>> seq[4:7] *= 100
>>> df = pd.DataFrame(seq, columns=['temp'])
temp
0 0
1 1
2 2
3 3
4 400
5 500
6 600
7 7
8 8
9 9
>>> df[df.temp>=300] = np.nan # adjust the condition accordingly
temp
0 0.0
1 1.0
2 2.0
3 3.0
4 NaN
5 NaN
6 NaN
7 7.0
8 8.0
9 9.0
>>> df.fillna(method='backfill')
temp
0 0.0
1 1.0
2 2.0
3 3.0
4 7.0
5 7.0
6 7.0
7 7.0
8 8.0
9 9.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论