在pandas中按1小时重新采样DataFrame会产生意外的NaN值。

huangapple go评论79阅读模式
英文:

Resampling of a DataFrame by 1 Hour in pandas gives unexpexted NaN values

问题

在pandas中按1小时重新采样DataFrame会产生意外的NaN值

我有一个包含3列的DataFrame。第一列 包含 日期 (如 2020-07-01、2020-07-01...);第二列 包含一个月内按小时间隔的 时间(如 00:00:00、01:00:00...);第三列 包含一个变量的对应值,包括DataFrame中的一些 缺失行(即缺失数据)。还有一些第二列(时间)中的值,如 15:06:55、16:00:01 等

我想要 按1小时重新采样DataFrame,并只在缺失数据的地方填充NaN值。在我的情况下,重新采样会在缺失数据的位置以及时间为 15:06:55、16:00:01 等的地方产生NaN值。请帮我解决这个问题。
提前感谢您。

df['Date-Time'] = pd.to_datetime(df[0] + df[1], format='%Y-%m-%d%H:%M:%S')
df = df.set_index('Date-Time')   
df = df.resample('1H').fillna(method=None)

这段代码会在缺失数据的位置以及时间为 15:06:55、16:00:01、17:00:01 等的地方产生NaN值。我想要按1小时重新采样DataFrame,并 只在缺失数据的位置填充NaN值。我已上传了重新采样前的DataFrame的图像。请帮我解决这个问题。
提前感谢您。我已上传了重新采样前的DataFrame的图像。

英文:

Resampling of a DataFrame by 1 Hour in pandas gives unexpected NaN values

I have a dataframe having 3 columns. 1st Column contains date ( like 2020-07-01,2020-07-01...); 2nd column contains time ( like 00:00:00, 01:00:00...) for one month on hourly basis and 3rd column contains the corresponding values of a variable including some missing rows (i.e., missing data) in the dataframe. Also some values in the 2nd column (time) is like 15:06:55, 16:00:01 etc.

I want to resample the dataframe by 1 Hour and fill NaN values only in place of the missing data. In my case, Resampling gives NaN values to the missing data place as well as where the time is like 15:06:55, 16:00:01 etc. Please help me to solve the issue.
Thanks in advance.

df['Date-Time'] = pd.to_datetime(df[0] + df[1],format='%Y-%m-%d%H:%M:%S')
df = df.set_index('Date-Time')   
df = df.resample('1H').fillna(method=None)

This code gives NaN values in place of missing data as well as where the time is like 15:06:55, 16:00:01, 17:00:01 etc. I want to resample the dataframe by 1 Hour and fill NaN values only in place of the missing data. I have uploaded an image of the dataframe before resampling. Please help me to solve the issue.
Thanks in advance.I have uploaded an image of the dataframe before resampling.

答案1

得分: 1

你可以使用fillna(method=None)方法来填充缺失数据,因此可以明确地用NaN值来填充它。请查看pandas文档

你可以使用插值或填充方法来填充缺失数据。
例如:

df = df.resample('1H').ffill()

或者

df = df.resample('1H').interpolate(method='bfill')

或者你可以使用fillna() 方法,如果在方法参数中提供了backfillbfillffill

-> 请查看interpolate() 文档

英文:

You use the fillna(method=None) method to fill the missing data. So you fill it with NaN values explicitly. See the pandas documentation.

You can use an interpolation or fill method to fill the missing data.
e.g.:

df = df.resample('1H').ffill()

or

df = df.resample('1H').interpolate(method='bfill')

or you fill it with the fillna() method, if you provide backfill, bfill of ffill in the method-argument.

-> look at the interpolate() documentation

huangapple
  • 本文由 发表于 2023年5月20日 20:38:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295272.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定