How to find max and min values in different groups in Excel data using R

huangapple go评论130阅读模式
英文:

How to find max and min values in different groups in Excel data using R

问题

我是新手,但需要编写一段代码来查找每个日期的最大值和最小值,然后创建新的列,显示最大/最小值及其所关联的小时。

以下是数据集的示例:

数据集
(如果将日期和时间分为两列,可能会更容易处理)

  1. 5/11/2022 12:00:00 AM -2.081698896
  2. 5/11/2022 01:00:00 AM -3.485276263
  3. 5/11/2022 02:00:00 AM -1.53921172
  4. 5/11/2022 03:00:00 AM -0.971277113
  5. 5/11/2022 04:00:00 AM 1.087641775
  6. 5/11/2022 05:00:00 AM 1.532685864
  7. 5/11/2022 06:00:00 AM -0.279100848
  8. 5/11/2022 07:00:00 AM -2.110513273
  9. 5/11/2022 08:00:00 AM -1.450939087
  10. 5/11/2022 09:00:00 AM -0.756648162
  11. 5/11/2022 10:00:00 AM 0.101390372
  12. 5/11/2022 11:00:00 AM 0.961030453
  13. 5/11/2022 12:00:00 PM 0.531307116
  14. 5/11/2022 01:00:00 PM 1.151855083
  15. 5/11/2022 02:00:00 PM 1.768739844
  16. 5/11/2022 03:00:00 PM 1.233669468
  17. 5/11/2022 04:00:00 PM 1.491899352
  18. 5/11/2022 05:00:00 PM 1.144093768
  19. 5/11/2022 06:00:00 PM 2.160505945
  20. 5/11/2022 07:00:00 PM 1.795628853
  21. 5/11/2022 08:00:00 PM 4.531107262
  22. 5/11/2022 09:00:00 PM 3.62462192
  23. 5/11/2022 10:00:00 PM 1.286594924
  24. 5/11/2022 11:00:00 PM -3.791104623
  25. 5/12/2022 12:00:00 AM 0.622076737
  26. 5/12/2022 01:00:00 AM 0.090568132
  27. 5/12/2022 02:00:00 AM 1.59146776
  28. 5/12/2022 03:00:00 AM -0.479784099
  29. 5/12/2022 04:00:00 AM 0.489422285
  30. 5/12/2022 05:00:00 AM 0.073997738
  31. 5/12/2022 06:00:00 AM -0.010696904
  32. 5/12/2022 07:00:00 AM -2.632483198
  33. 5/12/2022 08:00:00 AM -1.461297052
  34. 5/12/2022 09:00:00 AM -1.522490002
  35. 5/12/2022 10:00:00 AM -0.940460749
  36. 5/12/2022 11:00:00 AM -0.374553431
  37. 5/12/2022 12:00:00 PM 0.062439221
  38. 5/12/2022 01:00:00 PM 0.400666873
  39. 5/12/2022 02:00:00 PM 1.051929106
  40. 5/12/2022 03:00:00 PM 1.179060269
  41. 5/12/2022 04:00:00 PM 0.946240502
  42. 5/12/2022 05:00:00 PM 1.962386886
  43. 5/12/2022 06:00:00 PM 2.171089482
  44. 5/12/2022 07:00:00 PM 3.622429106
  45. 5/12/2022 08:00:00 PM 3.741439222
  46. 5/12/2022 09:00:00 PM -1.805726046
  47. 5/12/2022 10:00:00 PM -4.234257479
  48. 5/12/2022 11:00:00 PM -4.779267306
  49. 5/13/2022 12:00:00 AM -5.588944721
  50. 5/13/2022 01:00:00 AM -5.716968341
  51. 5/13/2022 02:00:00 AM -3.641873457
  52. 5/13/2022 03:00:00 AM -1.483733735
  53. 5/13/2022 04:00:00 AM -1.173960641
  54. 5/13/2022 05:00:00 AM -0.858666579
  55. 5/13/2022 06:00:00 AM -1.341592707
  56. 5/13/2022 07:00:00 AM -2.641237393
  57. 5/13/2022 08:00:00 AM -2.739222797
  58. 5/13/2022 09:00:00 AM -0.892693229
  59. 5/13/2022 10:00:00 AM -0.157481635
  60. 5/13/2022 11:00:00 AM 0.904515563
  61. 5/13/2022 12:00:00 PM 1.000066424
  62. 5/13/2022 01:00:00 PM 1.354307836
  63. 5/13/2022 02:00:00 PM 1.625249928
  64. 5/13/2022 03:00:00 PM 2.155380173
  65. 5/13/2022 04:00:00 PM 2.025535026
  66. 5/13/2022 05:00:00 PM 2.246808707
  67. 5/13/2022 06:00:00 PM 2.95513702
  68. 5/13/2022 07:00:00 PM 3.222890919
  69. 5/13/2022 08:00:00 PM 3.235698828
  70. 5/13/2022 09:00:00 PM 0.862131896
  71. 5/13/2022 10:00:00 PM -1.258192244
  72. 5/13/2022 11:00:00 PM 0.040982598

因此,使用此代码,输出将创建一个新列,类似于以下内容:

最大值:

  1. 5/11/2022 8:00PM 2.683375,
  2. 5/12/2022 7:00PM 1.9246,
  3. 5/13/2022 7:00PM 1.83976

然后,最小值也会有类似的输出。

非常感谢您的帮助!

谢谢!

英文:

I am new to R, but need to write a code to find the max and min values on each date, and then have it create new columns that give the max/min values and which hour of the day they are associated with.

Here is an example of the data set:

data set
(I can separate the date and time to two columns if it would make it easier)

  1. 5/11/2022 12:00:00 AM -2.081698896
  2. 5/11/2022 01:00:00 AM -3.485276263
  3. 5/11/2022 02:00:00 AM -1.53921172
  4. 5/11/2022 03:00:00 AM -0.971277113
  5. 5/11/2022 04:00:00 AM 1.087641775
  6. 5/11/2022 05:00:00 AM 1.532685864
  7. 5/11/2022 06:00:00 AM -0.279100848
  8. 5/11/2022 07:00:00 AM -2.110513273
  9. 5/11/2022 08:00:00 AM -1.450939087
  10. 5/11/2022 09:00:00 AM -0.756648162
  11. 5/11/2022 10:00:00 AM 0.101390372
  12. 5/11/2022 11:00:00 AM 0.961030453
  13. 5/11/2022 12:00:00 PM 0.531307116
  14. 5/11/2022 01:00:00 PM 1.151855083
  15. 5/11/2022 02:00:00 PM 1.768739844
  16. 5/11/2022 03:00:00 PM 1.233669468
  17. 5/11/2022 04:00:00 PM 1.491899352
  18. 5/11/2022 05:00:00 PM 1.144093768
  19. 5/11/2022 06:00:00 PM 2.160505945
  20. 5/11/2022 07:00:00 PM 1.795628853
  21. 5/11/2022 08:00:00 PM 4.531107262
  22. 5/11/2022 09:00:00 PM 3.62462192
  23. 5/11/2022 10:00:00 PM 1.286594924
  24. 5/11/2022 11:00:00 PM -3.791104623
  25. 5/12/2022 12:00:00 AM 0.622076737
  26. 5/12/2022 01:00:00 AM 0.090568132
  27. 5/12/2022 02:00:00 AM 1.59146776
  28. 5/12/2022 03:00:00 AM -0.479784099
  29. 5/12/2022 04:00:00 AM 0.489422285
  30. 5/12/2022 05:00:00 AM 0.073997738
  31. 5/12/2022 06:00:00 AM -0.010696904
  32. 5/12/2022 07:00:00 AM -2.632483198
  33. 5/12/2022 08:00:00 AM -1.461297052
  34. 5/12/2022 09:00:00 AM -1.522490002
  35. 5/12/2022 10:00:00 AM -0.940460749
  36. 5/12/2022 11:00:00 AM -0.374553431
  37. 5/12/2022 12:00:00 PM 0.062439221
  38. 5/12/2022 01:00:00 PM 0.400666873
  39. 5/12/2022 02:00:00 PM 1.051929106
  40. 5/12/2022 03:00:00 PM 1.179060269
  41. 5/12/2022 04:00:00 PM 0.946240502
  42. 5/12/2022 05:00:00 PM 1.962386886
  43. 5/12/2022 06:00:00 PM 2.171089482
  44. 5/12/2022 07:00:00 PM 3.622429106
  45. 5/12/2022 08:00:00 PM 3.741439222
  46. 5/12/2022 09:00:00 PM -1.805726046
  47. 5/12/2022 10:00:00 PM -4.234257479
  48. 5/12/2022 11:00:00 PM -4.779267306
  49. 5/13/2022 12:00:00 AM -5.588944721
  50. 5/13/2022 01:00:00 AM -5.716968341
  51. 5/13/2022 02:00:00 AM -3.641873457
  52. 5/13/2022 03:00:00 AM -1.483733735
  53. 5/13/2022 04:00:00 AM -1.173960641
  54. 5/13/2022 05:00:00 AM -0.858666579
  55. 5/13/2022 06:00:00 AM -1.341592707
  56. 5/13/2022 07:00:00 AM -2.641237393
  57. 5/13/2022 08:00:00 AM -2.739222797
  58. 5/13/2022 09:00:00 AM -0.892693229
  59. 5/13/2022 10:00:00 AM -0.157481635
  60. 5/13/2022 11:00:00 AM 0.904515563
  61. 5/13/2022 12:00:00 PM 1.000066424
  62. 5/13/2022 01:00:00 PM 1.354307836
  63. 5/13/2022 02:00:00 PM 1.625249928
  64. 5/13/2022 03:00:00 PM 2.155380173
  65. 5/13/2022 04:00:00 PM 2.025535026
  66. 5/13/2022 05:00:00 PM 2.246808707
  67. 5/13/2022 06:00:00 PM 2.95513702
  68. 5/13/2022 07:00:00 PM 3.222890919
  69. 5/13/2022 08:00:00 PM 3.235698828
  70. 5/13/2022 09:00:00 PM 0.862131896
  71. 5/13/2022 10:00:00 PM -1.258192244
  72. 5/13/2022 11:00:00 PM 0.040982598

So, with this code the output would hopefully create a new column that looked something like this:

max:

  1. 5/11/2022 8:00PM 2.683375,
  2. 5/12/2022 7:00PM 1.9246,
  3. 5/13/2022 7:00PM 1.83976,

and then have something similar for the minimum values.

Any help is greatly appreciated!!!

Thank you!

答案1

得分: 1

尝试这样做:

  1. # 加载所需的库
  2. library(readxl)
  3. library(dplyr)
  4. # 读取 Excel 文件
  5. your_data <- read_excel("data.xlsx")
  6. # 将日期和时间列转换为 POSIXct 格式
  7. your_data$DateTime <- as.POSIXct(your_data$Date, format = "%d%m%Y %H:%M:%S")
  8. # 提取日期和小时组件
  9. your_data$DateOnly <- as.Date(your_data$DateTime)
  10. your_data$Hour <- format(your_data$DateTime, "%H")
  11. # 按日期和小时分组,然后汇总以找到最大值和最小值
  12. result <- your_data %>%
  13. group_by(DateOnly, Hour) %>%
  14. summarize(max_value = max(values),
  15. min_value = min(values))
  16. # 将结果合并回原始数据集
  17. your_data <- your_data %>%
  18. left_join(result, by = c("DateOnly", "Hour"))
  19. # 打印修改后的数据集
  20. print(your_data)

注意:这是一段R代码,用于读取Excel文件并对数据进行处理。你需要确保已经安装了readxldplyr这两个R包。另外,你需要将代码中的data.xlsx替换为你实际的Excel文件路径。

英文:

Try this:

  1. # Load the required libraries
  2. library(readxl)
  3. library(dplyr)
  4. # Read the Excel file
  5. your_data &lt;- read_excel(&quot;data.xlsx&quot;)
  6. # Convert the date and time column to POSIXct format
  7. your_data$DateTime &lt;- as.POSIXct(your_data$Date, format = &quot;%d%m%Y %H:%M:%S&quot;)
  8. # Extract date and hour components
  9. your_data$DateOnly &lt;- as.Date(your_data$DateTime)
  10. your_data$Hour &lt;- format(your_data$DateTime, &quot;%H&quot;)
  11. # Group by date and hour, then summarize to find max and min values
  12. result &lt;- your_data %&gt;%
  13. group_by(DateOnly, Hour) %&gt;%
  14. summarize(max_value = max(values),
  15. min_value = min(values))
  16. # Merge the result back to the original dataset
  17. your_data &lt;- your_data %&gt;%
  18. left_join(result, by = c(&quot;DateOnly&quot;, &quot;Hour&quot;))
  19. # Print the modified dataset
  20. print(your_data)

huangapple
  • 本文由 发表于 2023年8月9日 06:57:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863631.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定