2023年2月9日 00:57:09go评论85阅读模式

英文:

Changing weather data frequency from 3 hours to 1 hour

问题

以下是已经翻译好的代码部分：

df_expanded = df.set_index(['date', 'city', 'condition'])\
                .hour.unstack().reset_index().melt(id_vars=['date', 'city', 'condition'], value_name='hour')\
                .dropna()\
                .drop(columns=['variable'])
df_expanded = df_expanded.sort_values(by=['date', 'city', 'condition', 'hour'])\
                        .ffill()
result = df_expanded.merge(df, on=['date', 'city', 'condition', 'hour'], how='left')\
                    .dropna()\
                    .drop_duplicates()

英文:

I have weather data which has the following column where the first 3 rows look like this

date	hour	city	condition	snow	rain
2023-01-30	3	berlin	snow	1	0
2023-01-30	6	berlin	rain	0	1
2023-01-30	9	berlin	clear	0	0

I want to write code where which will create rows for the missing hours and replace the values with the hour city and date closest to that hour. The result dataframe should look like

date	hour	city	condition	snow	rain
2023-01-30	3	berlin	snow	1	0
2023-01-30	4	berlin	snow	1	0
2023-01-30	5	berlin	snow	1	0
2023-01-30	6	berlin	rain	0	1
2023-01-30	7	berlin	rain	0	1
2023-01-30	8	berlin	rain	0	1
2023-01-30	9	berlin	clear	0	0
2023-01-30	10	berlin	clear	0	0
2023-01-30	10	berlin	clear	0	0

Note: I have many cities and many rows.

I tried this but dint get the right solution and its not optimum for large number of rows (cities and hours)

df_expanded = df.set_index([&#39;date&#39;, &#39;city&#39;, &#39;condition&#39;])\
                .hour.unstack().reset_index().melt(id_vars=[&#39;date&#39;, &#39;city&#39;, &#39;condition&#39;], value_name=&#39;hour&#39;)\
                .dropna()\
                .drop(columns=[&#39;variable&#39;])
df_expanded = df_expanded.sort_values(by=[&#39;date&#39;, &#39;city&#39;, &#39;condition&#39;, &#39;hour&#39;])\
                        .ffill()
result = df_expanded.merge(df, on=[&#39;date&#39;, &#39;city&#39;, &#39;condition&#39;, &#39;hour&#39;], how=&#39;left&#39;)\
                    .dropna()\
                    .drop_duplicates()

Open to easier and simpler solutions

答案1

得分: 2

以下是您提供的代码部分的翻译：

# some sample data
d = {'date': ['2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30', '2023-01-30'],
 'hour': [3, 6, 9, 3, 6, 9],
 'city': ['berlin', 'berlin', 'berlin', 'chicago', 'chicago', 'chicago'],
 'condition': ['snow', 'rain', 'clear', 'snow', 'snow', 'clear'],
 'snow': [1, 0, 0, 1, 1, 0],
 'rain': [0, 1, 0, 0, 0, 0]}
df = pd.DataFrame(d)
# convert to datetime and the hour to a timedelta and set as the index
df = df.set_index(pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'], unit='h')).drop(columns=['date', 'hour'])
# groupby the city and resample to the hour and ffill the missing data
df.groupby('city').resample('h').ffill().reset_index(level=0, drop=True)

如果您需要原始的日期和小时列，可以添加以下内容：

new_df = df.groupby('city').resample('h').ffill().reset_index(level=0, drop=True)
new_df = new_df.reset_index().rename(columns={'index': 'date'})
new_df['hour'] = new_df['date'].dt.hour
new_df['date'] = new_df['date'].dt.date

希望这对您有所帮助。

英文:

It is easiest to ffill the missing data like below but I will try to also think of a solution for the closest time

# some sample data
d = {&#39;date&#39;: [&#39;2023-01-30&#39;, &#39;2023-01-30&#39;, &#39;2023-01-30&#39;, &#39;2023-01-30&#39;, &#39;2023-01-30&#39;, &#39;2023-01-30&#39;],
 &#39;hour&#39;: [3, 6, 9, 3, 6, 9],
 &#39;city&#39;: [&#39;berlin&#39;, &#39;berlin&#39;, &#39;berlin&#39;, &#39;chicago&#39;, &#39;chicago&#39;, &#39;chicago&#39;],
 &#39;condition&#39;: [&#39;snow&#39;, &#39;rain&#39;, &#39;clear&#39;, &#39;snow&#39;, &#39;snow&#39;, &#39;clear&#39;],
 &#39;snow&#39;: [1, 0, 0, 1, 1, 0],
 &#39;rain&#39;: [0, 1, 0, 0, 0, 0]}
df = pd.DataFrame(d)
# convert to datetime and the hour to a timedelta and set as the index
df = df.set_index(pd.to_datetime(df[&#39;date&#39;]) + pd.to_timedelta(df[&#39;hour&#39;], unit=&#39;h&#39;)).drop(columns=[&#39;date&#39;, &#39;hour&#39;])
# groupby the city and resample to the hour and ffill the missing data
df.groupby(&#39;city&#39;).resample(&#39;h&#39;).ffill().reset_index(level=0, drop=True)
                        city condition  snow  rain
2023-01-30 03:00:00   berlin      snow     1     0
2023-01-30 04:00:00   berlin      snow     1     0
2023-01-30 05:00:00   berlin      snow     1     0
2023-01-30 06:00:00   berlin      rain     0     1
2023-01-30 07:00:00   berlin      rain     0     1
2023-01-30 08:00:00   berlin      rain     0     1
2023-01-30 09:00:00   berlin     clear     0     0
2023-01-30 03:00:00  chicago      snow     1     0
2023-01-30 04:00:00  chicago      snow     1     0
2023-01-30 05:00:00  chicago      snow     1     0
2023-01-30 06:00:00  chicago      snow     1     0
2023-01-30 07:00:00  chicago      snow     1     0
2023-01-30 08:00:00  chicago      snow     1     0
2023-01-30 09:00:00  chicago     clear     0     0

if you want the original columns of date and hour then add the following

new_df = df.groupby(&#39;city&#39;).resample(&#39;h&#39;).ffill().reset_index(level=0, drop=True)
new_df = new_df.reset_index().rename(columns={&#39;index&#39;: &#39;date&#39;})
new_df[&#39;hour&#39;] = new_df[&#39;date&#39;].dt.hour
new_df[&#39;date&#39;] = new_df[&#39;date&#39;].dt.date
          date     city condition  snow  rain  hour
0   2023-01-30   berlin      snow     1     0     3
1   2023-01-30   berlin      snow     1     0     4
2   2023-01-30   berlin      snow     1     0     5
3   2023-01-30   berlin      rain     0     1     6
4   2023-01-30   berlin      rain     0     1     7
5   2023-01-30   berlin      rain     0     1     8
6   2023-01-30   berlin     clear     0     0     9
7   2023-01-30  chicago      snow     1     0     3
8   2023-01-30  chicago      snow     1     0     4
9   2023-01-30  chicago      snow     1     0     5
10  2023-01-30  chicago      snow     1     0     6
11  2023-01-30  chicago      snow     1     0     7
12  2023-01-30  chicago      snow     1     0     8
13  2023-01-30  chicago     clear     0     0     9

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将天气数据更新频率从3小时更改为1小时。

问题

答案1

如何从utf-8中恢复符号

AWS Lambda 导入错误: 无法导入模块 “lambda_function”

Not able to run FastAPI server, ValueError: source code string cannot contain null bytes

在嵌套循环中设置 “break” 的位置。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。