英文:
How to find max and min values in different groups in Excel data using R
问题
我是新手,但需要编写一段代码来查找每个日期的最大值和最小值,然后创建新的列,显示最大/最小值及其所关联的小时。
以下是数据集的示例:
数据集
(如果将日期和时间分为两列,可能会更容易处理)
5/11/2022 12:00:00 AM -2.081698896
5/11/2022 01:00:00 AM -3.485276263
5/11/2022 02:00:00 AM -1.53921172
5/11/2022 03:00:00 AM -0.971277113
5/11/2022 04:00:00 AM 1.087641775
5/11/2022 05:00:00 AM 1.532685864
5/11/2022 06:00:00 AM -0.279100848
5/11/2022 07:00:00 AM -2.110513273
5/11/2022 08:00:00 AM -1.450939087
5/11/2022 09:00:00 AM -0.756648162
5/11/2022 10:00:00 AM 0.101390372
5/11/2022 11:00:00 AM 0.961030453
5/11/2022 12:00:00 PM 0.531307116
5/11/2022 01:00:00 PM 1.151855083
5/11/2022 02:00:00 PM 1.768739844
5/11/2022 03:00:00 PM 1.233669468
5/11/2022 04:00:00 PM 1.491899352
5/11/2022 05:00:00 PM 1.144093768
5/11/2022 06:00:00 PM 2.160505945
5/11/2022 07:00:00 PM 1.795628853
5/11/2022 08:00:00 PM 4.531107262
5/11/2022 09:00:00 PM 3.62462192
5/11/2022 10:00:00 PM 1.286594924
5/11/2022 11:00:00 PM -3.791104623
5/12/2022 12:00:00 AM 0.622076737
5/12/2022 01:00:00 AM 0.090568132
5/12/2022 02:00:00 AM 1.59146776
5/12/2022 03:00:00 AM -0.479784099
5/12/2022 04:00:00 AM 0.489422285
5/12/2022 05:00:00 AM 0.073997738
5/12/2022 06:00:00 AM -0.010696904
5/12/2022 07:00:00 AM -2.632483198
5/12/2022 08:00:00 AM -1.461297052
5/12/2022 09:00:00 AM -1.522490002
5/12/2022 10:00:00 AM -0.940460749
5/12/2022 11:00:00 AM -0.374553431
5/12/2022 12:00:00 PM 0.062439221
5/12/2022 01:00:00 PM 0.400666873
5/12/2022 02:00:00 PM 1.051929106
5/12/2022 03:00:00 PM 1.179060269
5/12/2022 04:00:00 PM 0.946240502
5/12/2022 05:00:00 PM 1.962386886
5/12/2022 06:00:00 PM 2.171089482
5/12/2022 07:00:00 PM 3.622429106
5/12/2022 08:00:00 PM 3.741439222
5/12/2022 09:00:00 PM -1.805726046
5/12/2022 10:00:00 PM -4.234257479
5/12/2022 11:00:00 PM -4.779267306
5/13/2022 12:00:00 AM -5.588944721
5/13/2022 01:00:00 AM -5.716968341
5/13/2022 02:00:00 AM -3.641873457
5/13/2022 03:00:00 AM -1.483733735
5/13/2022 04:00:00 AM -1.173960641
5/13/2022 05:00:00 AM -0.858666579
5/13/2022 06:00:00 AM -1.341592707
5/13/2022 07:00:00 AM -2.641237393
5/13/2022 08:00:00 AM -2.739222797
5/13/2022 09:00:00 AM -0.892693229
5/13/2022 10:00:00 AM -0.157481635
5/13/2022 11:00:00 AM 0.904515563
5/13/2022 12:00:00 PM 1.000066424
5/13/2022 01:00:00 PM 1.354307836
5/13/2022 02:00:00 PM 1.625249928
5/13/2022 03:00:00 PM 2.155380173
5/13/2022 04:00:00 PM 2.025535026
5/13/2022 05:00:00 PM 2.246808707
5/13/2022 06:00:00 PM 2.95513702
5/13/2022 07:00:00 PM 3.222890919
5/13/2022 08:00:00 PM 3.235698828
5/13/2022 09:00:00 PM 0.862131896
5/13/2022 10:00:00 PM -1.258192244
5/13/2022 11:00:00 PM 0.040982598
因此,使用此代码,输出将创建一个新列,类似于以下内容:
最大值:
5/11/2022 8:00PM 2.683375,
5/12/2022 7:00PM 1.9246,
5/13/2022 7:00PM 1.83976,
然后,最小值也会有类似的输出。
非常感谢您的帮助!
谢谢!
英文:
I am new to R, but need to write a code to find the max and min values on each date, and then have it create new columns that give the max/min values and which hour of the day they are associated with.
Here is an example of the data set:
data set
(I can separate the date and time to two columns if it would make it easier)
5/11/2022 12:00:00 AM -2.081698896
5/11/2022 01:00:00 AM -3.485276263
5/11/2022 02:00:00 AM -1.53921172
5/11/2022 03:00:00 AM -0.971277113
5/11/2022 04:00:00 AM 1.087641775
5/11/2022 05:00:00 AM 1.532685864
5/11/2022 06:00:00 AM -0.279100848
5/11/2022 07:00:00 AM -2.110513273
5/11/2022 08:00:00 AM -1.450939087
5/11/2022 09:00:00 AM -0.756648162
5/11/2022 10:00:00 AM 0.101390372
5/11/2022 11:00:00 AM 0.961030453
5/11/2022 12:00:00 PM 0.531307116
5/11/2022 01:00:00 PM 1.151855083
5/11/2022 02:00:00 PM 1.768739844
5/11/2022 03:00:00 PM 1.233669468
5/11/2022 04:00:00 PM 1.491899352
5/11/2022 05:00:00 PM 1.144093768
5/11/2022 06:00:00 PM 2.160505945
5/11/2022 07:00:00 PM 1.795628853
5/11/2022 08:00:00 PM 4.531107262
5/11/2022 09:00:00 PM 3.62462192
5/11/2022 10:00:00 PM 1.286594924
5/11/2022 11:00:00 PM -3.791104623
5/12/2022 12:00:00 AM 0.622076737
5/12/2022 01:00:00 AM 0.090568132
5/12/2022 02:00:00 AM 1.59146776
5/12/2022 03:00:00 AM -0.479784099
5/12/2022 04:00:00 AM 0.489422285
5/12/2022 05:00:00 AM 0.073997738
5/12/2022 06:00:00 AM -0.010696904
5/12/2022 07:00:00 AM -2.632483198
5/12/2022 08:00:00 AM -1.461297052
5/12/2022 09:00:00 AM -1.522490002
5/12/2022 10:00:00 AM -0.940460749
5/12/2022 11:00:00 AM -0.374553431
5/12/2022 12:00:00 PM 0.062439221
5/12/2022 01:00:00 PM 0.400666873
5/12/2022 02:00:00 PM 1.051929106
5/12/2022 03:00:00 PM 1.179060269
5/12/2022 04:00:00 PM 0.946240502
5/12/2022 05:00:00 PM 1.962386886
5/12/2022 06:00:00 PM 2.171089482
5/12/2022 07:00:00 PM 3.622429106
5/12/2022 08:00:00 PM 3.741439222
5/12/2022 09:00:00 PM -1.805726046
5/12/2022 10:00:00 PM -4.234257479
5/12/2022 11:00:00 PM -4.779267306
5/13/2022 12:00:00 AM -5.588944721
5/13/2022 01:00:00 AM -5.716968341
5/13/2022 02:00:00 AM -3.641873457
5/13/2022 03:00:00 AM -1.483733735
5/13/2022 04:00:00 AM -1.173960641
5/13/2022 05:00:00 AM -0.858666579
5/13/2022 06:00:00 AM -1.341592707
5/13/2022 07:00:00 AM -2.641237393
5/13/2022 08:00:00 AM -2.739222797
5/13/2022 09:00:00 AM -0.892693229
5/13/2022 10:00:00 AM -0.157481635
5/13/2022 11:00:00 AM 0.904515563
5/13/2022 12:00:00 PM 1.000066424
5/13/2022 01:00:00 PM 1.354307836
5/13/2022 02:00:00 PM 1.625249928
5/13/2022 03:00:00 PM 2.155380173
5/13/2022 04:00:00 PM 2.025535026
5/13/2022 05:00:00 PM 2.246808707
5/13/2022 06:00:00 PM 2.95513702
5/13/2022 07:00:00 PM 3.222890919
5/13/2022 08:00:00 PM 3.235698828
5/13/2022 09:00:00 PM 0.862131896
5/13/2022 10:00:00 PM -1.258192244
5/13/2022 11:00:00 PM 0.040982598
So, with this code the output would hopefully create a new column that looked something like this:
max:
5/11/2022 8:00PM 2.683375,
5/12/2022 7:00PM 1.9246,
5/13/2022 7:00PM 1.83976,
and then have something similar for the minimum values.
Any help is greatly appreciated!!!
Thank you!
答案1
得分: 1
尝试这样做:
# 加载所需的库
library(readxl)
library(dplyr)
# 读取 Excel 文件
your_data <- read_excel("data.xlsx")
# 将日期和时间列转换为 POSIXct 格式
your_data$DateTime <- as.POSIXct(your_data$Date, format = "%d%m%Y %H:%M:%S")
# 提取日期和小时组件
your_data$DateOnly <- as.Date(your_data$DateTime)
your_data$Hour <- format(your_data$DateTime, "%H")
# 按日期和小时分组,然后汇总以找到最大值和最小值
result <- your_data %>%
group_by(DateOnly, Hour) %>%
summarize(max_value = max(values),
min_value = min(values))
# 将结果合并回原始数据集
your_data <- your_data %>%
left_join(result, by = c("DateOnly", "Hour"))
# 打印修改后的数据集
print(your_data)
注意:这是一段R代码,用于读取Excel文件并对数据进行处理。你需要确保已经安装了readxl
和dplyr
这两个R包。另外,你需要将代码中的data.xlsx
替换为你实际的Excel文件路径。
英文:
Try this:
# Load the required libraries
library(readxl)
library(dplyr)
# Read the Excel file
your_data <- read_excel("data.xlsx")
# Convert the date and time column to POSIXct format
your_data$DateTime <- as.POSIXct(your_data$Date, format = "%d%m%Y %H:%M:%S")
# Extract date and hour components
your_data$DateOnly <- as.Date(your_data$DateTime)
your_data$Hour <- format(your_data$DateTime, "%H")
# Group by date and hour, then summarize to find max and min values
result <- your_data %>%
group_by(DateOnly, Hour) %>%
summarize(max_value = max(values),
min_value = min(values))
# Merge the result back to the original dataset
your_data <- your_data %>%
left_join(result, by = c("DateOnly", "Hour"))
# Print the modified dataset
print(your_data)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论