在 ggplot 中绘制不同组的多条线。

huangapple go评论65阅读模式
英文:

Plotting Multiple Lines on Graph for Different groups in ggplot

问题

我正在尝试编辑我在ggplot中创建的绘图,以便每个部分显示两条不同的重叠线(一条是平均实际运行小时数,另一条是平均计划运行小时数),并在右侧显示一个图例,以指示哪条线是实际的,哪条线是计划的。我参考了这里的帖子1,但由于我处理的是需要重叠的不同列,而不是一个变量内的组,所以无法在我的情况下找到解决方案。请注意,在这种情况下,这些线几乎相同,但我还有其他使用情况涉及到相同任务的情况,其中这些线将明显不同 - 因此需要帮助。

以下是我的数据,供参考:

structure(list(month_yr = c("2022-01", "2022-01", "2022-02", 
"2022-02", "2022-03", "2022-03", "2022-04", "2022-04", "2022-05", 
"2022-05", "2022-06", "2022-06", "2022-07", "2022-07", "2022-08", 
"2022-08", "2022-09", "2022-09", "2022-10", "2022-10", "2022-11", 
"2022-11", "2022-12", "2022-12", "2023-01", "2023-01", "2023-02", 
"2023-02"), plant_name = c("plant_f", "plant_s", "plant_f", "plant_s", 
"plant_f", "plant_s", "plant_f", "plant_s", "plant_f", "plant_s", 
"plant_f", "plant_s", "plant_f", "plant_s", "plant_f", "plant_s", 
"plant_f", "plant_s", "plant_f", "plant_s", "plant_f", "plant_s", 
"plant_f", "plant_s", "plant_f", "plant_s", "plant_f", "plant_s"
), avg_run_hours = c(15.0080608695652, 16.3453608247423, 14.7394112149533, 
16.1025555555556, 14.9570175438596, 15.7327777777778, 17.0074257425743, 
16.5604901960784, 16.989010989011, 16.3021296296296, 14.8100961538462, 
15.8714516129032, 16.5552083333333, 15.3971568627451, 16.2258771929825, 
14.2616279069767, 17.2556179775281, 14.3790350877193, 16.3594903846154, 
15.5988617886179, 14.4050925925926, 15.9334920634921, 14.3455056179775, 
16.6322935779817, 16.6958762886598, 17.1025714285714, 16.046875, 
16.8408695652174), avg_sched_run_hours = c(15.0267043478261, 
16.4351340206186, 15.0025140186916, 16.2041555555556, 14.8281578947368, 
15.9119814814815, 17.1840099009901, 16.7646666666667, 17.0109340659341, 
16.4446388888889, 14.7679615384615, 16.1768790322581, 16.3242083333333, 
15.7033333333333, 16.343701754386, 14.5158139534884, 17.4342921348315, 
14.5827280701754, 16.4562692307692, 15.4149105691057, 14.2729537037037, 
16.1438253968254, 14.3073595505618, 16.7186330275229, 16.6436082474227, 
17.0332952380952, 16.3137916666667, 16.9656739130435)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -28L), groups = structure(list(
    month_yr = c("2022-01", "2022-02", "2022-03", "2022-04", 
    "2022-05", "2022-06", "2022-07", "2022-08", "2022-09", "2022-10", 
    "2022-11", "2022-12", "2023-01", "2023-02"), .rows = structure(list(
        1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16, 17:18, 
        19:20, 21:22, 23:24, 25:26, 27:28), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -14L), .drop = TRUE))

我用于创建下面图像的代码:

hours_by_plant <-
    ggplot(so_run_hour_stats, aes(x=month_yr, y=avg_sched_run_hours, group=1)) + geom_point() +
    geom_line(color="red") + xlab("Month of Year") + ylab("Avg Run Hours") +
    ggtitle("Avg Plant Run Hours by Month from 01/2022 - 02/2023") + theme_classic() +
    facet_wrap(~plant_name)

hours_by_plant <- hours_by_plant + theme(plot.title = element_text(hjust = 0.5))

在 ggplot 中绘制不同组的多条线。

英文:

I am trying to edit a plot that I created in ggplot such that each facet shows 2 different lines overlapping (one for avg. actual run hours, one for avg. scheduled run hours) as well as display a legend to the right to indicate which line is actual vs. scheduled. I referenced the post here but was unable to get the solution to work in my case because I'm dealing with different columns that need to be overlapped, and not a group within one variable. Please note that the lines will be nearly identical in this case, but I have other use cases involving the same task where the lines will differ significantly - hence the request for help.

My data is listed below for reference:

structure(list(month_yr = c(&quot;2022-01&quot;, &quot;2022-01&quot;, &quot;2022-02&quot;, 
&quot;2022-02&quot;, &quot;2022-03&quot;, &quot;2022-03&quot;, &quot;2022-04&quot;, &quot;2022-04&quot;, &quot;2022-05&quot;, 
&quot;2022-05&quot;, &quot;2022-06&quot;, &quot;2022-06&quot;, &quot;2022-07&quot;, &quot;2022-07&quot;, &quot;2022-08&quot;, 
&quot;2022-08&quot;, &quot;2022-09&quot;, &quot;2022-09&quot;, &quot;2022-10&quot;, &quot;2022-10&quot;, &quot;2022-11&quot;, 
&quot;2022-11&quot;, &quot;2022-12&quot;, &quot;2022-12&quot;, &quot;2023-01&quot;, &quot;2023-01&quot;, &quot;2023-02&quot;, 
&quot;2023-02&quot;), plant_name = c(&quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, 
&quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, 
&quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, 
&quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, 
&quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;, &quot;plant_f&quot;, &quot;plant_s&quot;
), avg_run_hours = c(15.0080608695652, 16.3453608247423, 14.7394112149533, 
16.1025555555556, 14.9570175438596, 15.7327777777778, 17.0074257425743, 
16.5604901960784, 16.989010989011, 16.3021296296296, 14.8100961538462, 
15.8714516129032, 16.5552083333333, 15.3971568627451, 16.2258771929825, 
14.2616279069767, 17.2556179775281, 14.3790350877193, 16.3594903846154, 
15.5988617886179, 14.4050925925926, 15.9334920634921, 14.3455056179775, 
16.6322935779817, 16.6958762886598, 17.1025714285714, 16.046875, 
16.8408695652174), avg_sched_run_hours = c(15.0267043478261, 
16.4351340206186, 15.0025140186916, 16.2041555555556, 14.8281578947368, 
15.9119814814815, 17.1840099009901, 16.7646666666667, 17.0109340659341, 
16.4446388888889, 14.7679615384615, 16.1768790322581, 16.3242083333333, 
15.7033333333333, 16.343701754386, 14.5158139534884, 17.4342921348315, 
14.5827280701754, 16.4562692307692, 15.4149105691057, 14.2729537037037, 
16.1438253968254, 14.3073595505618, 16.7186330275229, 16.6436082474227, 
17.0332952380952, 16.3137916666667, 16.9656739130435)), class = c(&quot;grouped_df&quot;, 
&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -28L), groups = structure(list(
    month_yr = c(&quot;2022-01&quot;, &quot;2022-02&quot;, &quot;2022-03&quot;, &quot;2022-04&quot;, 
    &quot;2022-05&quot;, &quot;2022-06&quot;, &quot;2022-07&quot;, &quot;2022-08&quot;, &quot;2022-09&quot;, &quot;2022-10&quot;, 
    &quot;2022-11&quot;, &quot;2022-12&quot;, &quot;2023-01&quot;, &quot;2023-02&quot;), .rows = structure(list(
        1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16, 17:18, 
        19:20, 21:22, 23:24, 25:26, 27:28), ptype = integer(0), class = c(&quot;vctrs_list_of&quot;, 
    &quot;vctrs_vctr&quot;, &quot;list&quot;))), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
), row.names = c(NA, -14L), .drop = TRUE))

Code I used to create the image below:

hours_by_plant &lt;-
    ggplot(so_run_hour_stats, aes(x=month_yr, y=avg_sched_run_hours, group=1)) + geom_point() +
    geom_line(color=&quot;red&quot;) + xlab(&quot;Month of Year&quot;) + ylab(&quot;Avg Run Hours&quot;) +
    ggtitle(&quot;Avg Plant Run Hours by Month from 01/2022 - 02/2023&quot;) + theme_classic() +
    facet_wrap(~plant_name)

hours_by_plant &lt;- hours_by_plant + theme(plot.title = element_text(hjust = 0.5))

在 ggplot 中绘制不同组的多条线。

答案1

得分: 1

这种类型的问题通常涉及数据重塑。数据格式应该是长格式,而数据处于宽格式。请参考此帖子,了解如何将数据从宽格式重塑为长格式。

我还将日期转换为实际日期对象,并在数据透视后编辑了avg_*列。

如果需要,你可以将X轴的日期间隔设置为不同的值,例如1个月。

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
})

so_run_hour_stats %>%
  pivot_longer(cols = starts_with("avg"), names_to = "Average") %>%
  mutate(month_yr = as.Date(paste0(month_yr, "-01")),
         Average = sub("avg_", "", Average),
         Average = gsub("_", " ", Average)) %>%
  ggplot(aes(month_yr, value, colour = Average)) +
  geom_line() +
  geom_point(color = "black") +
  scale_x_date(date_breaks = "3 months", date_labels = "%Y-%m") +
  scale_color_manual(values = c("red", "blue")) +
  facet_wrap(~ plant_name) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))

在 ggplot 中绘制不同组的多条线。

创建于2023年02月18日,使用reprex v2.0.2

英文:

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

I also transform the dates into real date objects and edit the avg_* columns after pivoting.

You can set the x axis date breaks to different values if you want to, for instance, to 1 month.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
})

so_run_hour_stats %&gt;%
  pivot_longer(cols = starts_with(&quot;avg&quot;), names_to = &quot;Average&quot;) %&gt;%
  mutate(month_yr = as.Date(paste0(month_yr, &quot;-01&quot;)),
         Average = sub(&quot;avg_&quot;, &quot;&quot;, Average),
         Average = gsub(&quot;_&quot;, &quot; &quot;, Average)) %&gt;%
  ggplot(aes(month_yr, value, colour = Average)) +
  geom_line() +
  geom_point(color = &quot;black&quot;) +
  scale_x_date(date_breaks = &quot;3 months&quot;, date_labels = &quot;%Y-%m&quot;) +
  scale_color_manual(values = c(&quot;red&quot;, &quot;blue&quot;)) +
  facet_wrap(~ plant_name) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))

在 ggplot 中绘制不同组的多条线。<!-- -->

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月18日 13:49:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491465.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定