How to find max and min values in different groups in Excel data using R

huangapple go评论93阅读模式
英文:

How to find max and min values in different groups in Excel data using R

问题

我是新手,但需要编写一段代码来查找每个日期的最大值和最小值,然后创建新的列,显示最大/最小值及其所关联的小时。

以下是数据集的示例:

数据集
(如果将日期和时间分为两列,可能会更容易处理)

5/11/2022 12:00:00 AM	-2.081698896
5/11/2022 01:00:00 AM	-3.485276263
5/11/2022 02:00:00 AM	-1.53921172
5/11/2022 03:00:00 AM	-0.971277113
5/11/2022 04:00:00 AM	1.087641775
5/11/2022 05:00:00 AM	1.532685864
5/11/2022 06:00:00 AM	-0.279100848
5/11/2022 07:00:00 AM	-2.110513273
5/11/2022 08:00:00 AM	-1.450939087
5/11/2022 09:00:00 AM	-0.756648162
5/11/2022 10:00:00 AM	0.101390372
5/11/2022 11:00:00 AM	0.961030453
5/11/2022 12:00:00 PM	0.531307116
5/11/2022 01:00:00 PM	1.151855083
5/11/2022 02:00:00 PM	1.768739844
5/11/2022 03:00:00 PM	1.233669468
5/11/2022 04:00:00 PM	1.491899352
5/11/2022 05:00:00 PM	1.144093768
5/11/2022 06:00:00 PM	2.160505945
5/11/2022 07:00:00 PM	1.795628853
5/11/2022 08:00:00 PM	4.531107262
5/11/2022 09:00:00 PM	3.62462192
5/11/2022 10:00:00 PM	1.286594924
5/11/2022 11:00:00 PM	-3.791104623
5/12/2022 12:00:00 AM	0.622076737
5/12/2022 01:00:00 AM	0.090568132
5/12/2022 02:00:00 AM	1.59146776
5/12/2022 03:00:00 AM	-0.479784099
5/12/2022 04:00:00 AM	0.489422285
5/12/2022 05:00:00 AM	0.073997738
5/12/2022 06:00:00 AM	-0.010696904
5/12/2022 07:00:00 AM	-2.632483198
5/12/2022 08:00:00 AM	-1.461297052
5/12/2022 09:00:00 AM	-1.522490002
5/12/2022 10:00:00 AM	-0.940460749
5/12/2022 11:00:00 AM	-0.374553431
5/12/2022 12:00:00 PM	0.062439221
5/12/2022 01:00:00 PM	0.400666873
5/12/2022 02:00:00 PM	1.051929106
5/12/2022 03:00:00 PM	1.179060269
5/12/2022 04:00:00 PM	0.946240502
5/12/2022 05:00:00 PM	1.962386886
5/12/2022 06:00:00 PM	2.171089482
5/12/2022 07:00:00 PM	3.622429106
5/12/2022 08:00:00 PM	3.741439222
5/12/2022 09:00:00 PM	-1.805726046
5/12/2022 10:00:00 PM	-4.234257479
5/12/2022 11:00:00 PM	-4.779267306
5/13/2022 12:00:00 AM	-5.588944721
5/13/2022 01:00:00 AM	-5.716968341
5/13/2022 02:00:00 AM	-3.641873457
5/13/2022 03:00:00 AM	-1.483733735
5/13/2022 04:00:00 AM	-1.173960641
5/13/2022 05:00:00 AM	-0.858666579
5/13/2022 06:00:00 AM	-1.341592707
5/13/2022 07:00:00 AM	-2.641237393
5/13/2022 08:00:00 AM	-2.739222797
5/13/2022 09:00:00 AM	-0.892693229
5/13/2022 10:00:00 AM	-0.157481635
5/13/2022 11:00:00 AM	0.904515563
5/13/2022 12:00:00 PM	1.000066424
5/13/2022 01:00:00 PM	1.354307836
5/13/2022 02:00:00 PM	1.625249928
5/13/2022 03:00:00 PM	2.155380173
5/13/2022 04:00:00 PM	2.025535026
5/13/2022 05:00:00 PM	2.246808707
5/13/2022 06:00:00 PM	2.95513702
5/13/2022 07:00:00 PM	3.222890919
5/13/2022 08:00:00 PM	3.235698828
5/13/2022 09:00:00 PM	0.862131896
5/13/2022 10:00:00 PM	-1.258192244
5/13/2022 11:00:00 PM	0.040982598

因此,使用此代码,输出将创建一个新列,类似于以下内容:

最大值:

5/11/2022  8:00PM  2.683375,
5/12/2022  7:00PM  1.9246,
5/13/2022  7:00PM  1.83976,

然后,最小值也会有类似的输出。

非常感谢您的帮助!

谢谢!

英文:

I am new to R, but need to write a code to find the max and min values on each date, and then have it create new columns that give the max/min values and which hour of the day they are associated with.

Here is an example of the data set:

data set
(I can separate the date and time to two columns if it would make it easier)

5/11/2022 12:00:00 AM	-2.081698896
5/11/2022 01:00:00 AM	-3.485276263
5/11/2022 02:00:00 AM	-1.53921172
5/11/2022 03:00:00 AM	-0.971277113
5/11/2022 04:00:00 AM	1.087641775
5/11/2022 05:00:00 AM	1.532685864
5/11/2022 06:00:00 AM	-0.279100848
5/11/2022 07:00:00 AM	-2.110513273
5/11/2022 08:00:00 AM	-1.450939087
5/11/2022 09:00:00 AM	-0.756648162
5/11/2022 10:00:00 AM	0.101390372
5/11/2022 11:00:00 AM	0.961030453
5/11/2022 12:00:00 PM	0.531307116
5/11/2022 01:00:00 PM	1.151855083
5/11/2022 02:00:00 PM	1.768739844
5/11/2022 03:00:00 PM	1.233669468
5/11/2022 04:00:00 PM	1.491899352
5/11/2022 05:00:00 PM	1.144093768
5/11/2022 06:00:00 PM	2.160505945
5/11/2022 07:00:00 PM	1.795628853
5/11/2022 08:00:00 PM	4.531107262
5/11/2022 09:00:00 PM	3.62462192
5/11/2022 10:00:00 PM	1.286594924
5/11/2022 11:00:00 PM	-3.791104623
5/12/2022 12:00:00 AM	0.622076737
5/12/2022 01:00:00 AM	0.090568132
5/12/2022 02:00:00 AM	1.59146776
5/12/2022 03:00:00 AM	-0.479784099
5/12/2022 04:00:00 AM	0.489422285
5/12/2022 05:00:00 AM	0.073997738
5/12/2022 06:00:00 AM	-0.010696904
5/12/2022 07:00:00 AM	-2.632483198
5/12/2022 08:00:00 AM	-1.461297052
5/12/2022 09:00:00 AM	-1.522490002
5/12/2022 10:00:00 AM	-0.940460749
5/12/2022 11:00:00 AM	-0.374553431
5/12/2022 12:00:00 PM	0.062439221
5/12/2022 01:00:00 PM	0.400666873
5/12/2022 02:00:00 PM	1.051929106
5/12/2022 03:00:00 PM	1.179060269
5/12/2022 04:00:00 PM	0.946240502
5/12/2022 05:00:00 PM	1.962386886
5/12/2022 06:00:00 PM	2.171089482
5/12/2022 07:00:00 PM	3.622429106
5/12/2022 08:00:00 PM	3.741439222
5/12/2022 09:00:00 PM	-1.805726046
5/12/2022 10:00:00 PM	-4.234257479
5/12/2022 11:00:00 PM	-4.779267306
5/13/2022 12:00:00 AM	-5.588944721
5/13/2022 01:00:00 AM	-5.716968341
5/13/2022 02:00:00 AM	-3.641873457
5/13/2022 03:00:00 AM	-1.483733735
5/13/2022 04:00:00 AM	-1.173960641
5/13/2022 05:00:00 AM	-0.858666579
5/13/2022 06:00:00 AM	-1.341592707
5/13/2022 07:00:00 AM	-2.641237393
5/13/2022 08:00:00 AM	-2.739222797
5/13/2022 09:00:00 AM	-0.892693229
5/13/2022 10:00:00 AM	-0.157481635
5/13/2022 11:00:00 AM	0.904515563
5/13/2022 12:00:00 PM	1.000066424
5/13/2022 01:00:00 PM	1.354307836
5/13/2022 02:00:00 PM	1.625249928
5/13/2022 03:00:00 PM	2.155380173
5/13/2022 04:00:00 PM	2.025535026
5/13/2022 05:00:00 PM	2.246808707
5/13/2022 06:00:00 PM	2.95513702
5/13/2022 07:00:00 PM	3.222890919
5/13/2022 08:00:00 PM	3.235698828
5/13/2022 09:00:00 PM	0.862131896
5/13/2022 10:00:00 PM	-1.258192244
5/13/2022 11:00:00 PM	0.040982598

So, with this code the output would hopefully create a new column that looked something like this:

max:

5/11/2022  8:00PM  2.683375,
5/12/2022  7:00PM  1.9246,
5/13/2022  7:00PM  1.83976,

and then have something similar for the minimum values.

Any help is greatly appreciated!!!

Thank you!

答案1

得分: 1

尝试这样做:

# 加载所需的库
library(readxl)
library(dplyr)

# 读取 Excel 文件
your_data <- read_excel("data.xlsx")

# 将日期和时间列转换为 POSIXct 格式
your_data$DateTime <- as.POSIXct(your_data$Date, format = "%d%m%Y %H:%M:%S")

# 提取日期和小时组件
your_data$DateOnly <- as.Date(your_data$DateTime)
your_data$Hour <- format(your_data$DateTime, "%H")

# 按日期和小时分组,然后汇总以找到最大值和最小值
result <- your_data %>%
  group_by(DateOnly, Hour) %>%
  summarize(max_value = max(values),
            min_value = min(values))

# 将结果合并回原始数据集
your_data <- your_data %>%
  left_join(result, by = c("DateOnly", "Hour"))

# 打印修改后的数据集
print(your_data)

注意:这是一段R代码,用于读取Excel文件并对数据进行处理。你需要确保已经安装了readxldplyr这两个R包。另外,你需要将代码中的data.xlsx替换为你实际的Excel文件路径。

英文:

Try this:

# Load the required libraries
library(readxl)
library(dplyr)

# Read the Excel file
your_data &lt;- read_excel(&quot;data.xlsx&quot;)

# Convert the date and time column to POSIXct format
your_data$DateTime &lt;- as.POSIXct(your_data$Date, format = &quot;%d%m%Y %H:%M:%S&quot;)

# Extract date and hour components
your_data$DateOnly &lt;- as.Date(your_data$DateTime)
your_data$Hour &lt;- format(your_data$DateTime, &quot;%H&quot;)

# Group by date and hour, then summarize to find max and min values
result &lt;- your_data %&gt;%
  group_by(DateOnly, Hour) %&gt;%
  summarize(max_value = max(values),
            min_value = min(values))

# Merge the result back to the original dataset
your_data &lt;- your_data %&gt;%
  left_join(result, by = c(&quot;DateOnly&quot;, &quot;Hour&quot;))

# Print the modified dataset
print(your_data)

huangapple
  • 本文由 发表于 2023年8月9日 06:57:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863631.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定