英文:
How to replace NaN values with the correspondent month and hour mean value
问题
我正在尝试用DataFrame中对应的月份和小时均值来替换NaN值。假设我有一个包含通用测量的DataFrame,其中某些行和列随机没有测量数据。该DataFrame的第一列是具有小时频率的日期时间注册。
我已经创建了另一个DataFrame,用于计算每个月每小时的均值,但我无法用它的均值替换第一个DataFrame中的NaN值。
首先,让我们创建一个类似说明的通用DataFrame:
import pandas as pd
import numpy as np
p = 0.1
columns = ['A','B','C','D','E','F','G','H','I','J']
size = 1000
df = pd.DataFrame(np.random.randint(0,100,size=(size,len(columns))), columns= columns)
mask = np.random.choice([True,False] , size= df.shape, p=[p,1-p])
df = df.mask(mask)
df.insert(0, 'date' ,pd.date_range('2000-01-01 00:00' , periods= size, freq = 'H'))
然后让我们创建具有均值的DataFrame:
mean_df = df.groupby([df.date.dt.month , df.date.dt.hour]).mean()
mean_df.index.set_names(['month' , 'hour'],inplace=True)
mean_df.reset_index(inplace=True)
我可以为一个列执行此操作,但无法为所有列执行:
empty = np.where(df['A'].isna() == True)[0].tolist()
for i in range(len(empty)):
a = empty[i]
r = df.columns.get_loc('A')
df.iat[a, r] = mean_df.iat[int(np.where((mean_df.month == df.iat[a,0].month) & (mean_df.hour == df.iat[a,0].hour))[0]),r]
请问您需要关于其他列的类似操作的帮助吗?
英文:
I'm trying to replace the NaN values from a DataFrame with the correspondent month and hour mean value of this DataFrame.
So let's say I have a DataFrame with generic measures where, randomly, there is no measure in some rows and columns. This DataFrame's first column is the datetime registry with hour frequency.
I've created another DataFrame that calculates the mean value for every hour of each month, but i can't replace the NaN values of the first DataFrame with it's mean correspondent value.
First, let's create a generic DataFrame similar to the explained:
import pandas as pd
import numpy as np
p = 0.1
columns = ['A','B','C','D','E','F','G','H','I','J']
size = 1000
df = pd.DataFrame(np.random.randint(0,100,size=(size,len(columns))), columns= columns)
mask = np.random.choice([True,False] , size= df.shape, p=[p,1-p])
df = df.mask(mask)
df.insert(0, 'date' ,pd.date_range('2000-01-01 00:00' , periods= size, freq = 'H'))
Then lets create the DataFrame with the means values:
mean_df = df.groupby([df.date.dt.month , df.date.dt.hour]).mean()
mean_df.index.set_names(['month' , 'hour'],inplace=True)
mean_df.reset_index(inplace=True)
I can make it for one column, but i couldn't make it for all the columns:
empty = np.where(df['A'].isna() == True)[0].tolist()
for i in range(len(empty)):
a = empty[i]
r = df.columns.get_loc('A')
df.iat[a, r] = mean_df.iat[int(np.where((mean_df.month == df.iat[a,0].month) & (mean_df.hour == df.iat[a,0].hour))[0]),r]
答案1
得分: 0
我猜最简单的方法是遍历每一列:
for c in columns:
empty = np.where(df[c].isna() == True)[0].tolist()
for i in range(len(empty)):
a = empty[i]
r = df.columns.get_loc()
df.iat[a, r] = mean_df.iat[int(np.where((mean_df.month == df.iat[a,0].month) & (mean_df.hour == df.iat[a,0].hour))[0]),r]
英文:
I guess the easiest approach is iterating over every column:
for c in columns:
empty = np.where(df[c].isna() == True)[0].tolist()
for i in range(len(empty)):
a = empty[i]
r = df.columns.get_loc()
df.iat[a, r] = mean_df.iat[int(np.where((mean_df.month == df.iat[a,0].month) & (mean_df.hour == df.iat[a,0].hour))[0]),r]
答案2
得分: 0
以下是翻译好的代码部分:
df['month'] = df['date'].dt.month
df['hour'] = df['date'].dt.hour
def func(x, df):
return pd.Series([df.loc[int(x['month']), int(x['hour'])][c] if np.isnan(x[c]) else x[c] for c in x.index], index=x.index)
df = df.set_index('date').apply(lambda x: func(x, mean_df.set_index(['month', 'hour'])), axis=1).drop(columns=['month', 'hour']).reset_index()
英文:
Not pretty nor fast but here you go...
df['month'] = df['date'].dt.month
df['hour'] = df['date'].dt.hour
def func(x, df):
return pd.Series([df.loc[int(x['month']), int(x['hour'])][c] if np.isnan(x[c]) else x[c] for c in x.index], index=x.index)
df = df.set_index('date').apply(lambda x: func(x, mean_df.set_index(['month', 'hour'])), axis=1).drop(columns=['month', 'hour']).reset_index()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论