英文:
How do I make my plot from R look like the one I have on Python?
问题
我试图让我的R中的图表看起来像我在Python中拥有的那个:
![左侧是Python(正确的),右侧是R(我正试图更改这个)](https://i.stack.imgur.com/HF83P.jpg)
这是Python和R的数据框。
Python:
# 为了绘制折线图
# 为每一年创建单独的数据框
years = All_Flights_Combined_Month['Year'].unique()
data_frames_month = [All_Flights_Combined_Month[All_Flights_Combined_Month['Year'] == year] for year in years]
# 创建子图
fig, ax = plt.subplots(figsize=(10, 8))
# 为每一年绘制 Delay_count
for i, year in enumerate(years):
color = 'red' if str(year) == '2003' else 'green' if str(year) == '2004' else 'blue'
ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Delay_count'], label=f"{year} Delay Count", color=color)
# 为每一年绘制 Total_Count
for i, year in enumerate(years):
color = 'orange' if str(year) == '2003' else 'yellow' if str(year) == '2004' else 'purple'
ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Total_Count'], label=f"{year} Total Count", color=color)
# 设置标题和标签
ax.set_title('Flight Count by Month')
ax.set_xlabel('Month')
ax.set_ylabel('Number of Flights')
# 添加图例
ax.legend(title='Year')
# 将图表保存为pdf文件
plt.savefig('Monthly Flight Comparison Python.pdf', format='pdf')
# 显示图表
plt.show()
R:
# 为了绘制折线图
month_plot <- ggplot() + geom_line(data= All_Flights_Combined_Month, aes(x =Month, y=Delay_count, group=Year, color=Year)) +
geom_line(data=All_Flights_Combined_Month, aes(x =Month, y=Total_count, group=Year, color=Year))+ scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
xlab("Months")+
ylab("Number of Flights")+
ggtitle("Flight Count by Month")
# 保存图表为 .pdf
ggplot2::ggsave("Monthly Flight Comparison R.pdf", plot = last_plot(), width = 8, height = 6)
在R代码中,您可以尝试添加 scale_color_manual
来手动设置颜色,但要确保您没有重复使用相同的颜色值。
英文:
I am trying to make a plot on my R look like the one I have on my Python:
This is the data frame for both Python and R.
All_Flights_Combined_Month
Year | Month | Delay_count | Total_count |
---|---|---|---|
2003 | Jan | 151238 | 552109 |
2003 | Feb | 158369 | 500206 |
2003 | Mar | 152156 | 559342 |
2003 | Apr | 125699 | 527303 |
2003 | May | 136551 | 533782 |
2003 | Jun | 163497 | 536496 |
2003 | Jul | 183491 | 558568 |
2003 | Aug | 178979 | 556984 |
2003 | Sep | 113916 | 527714 |
2003 | Oct | 131409 | 552370 |
2003 | Nov | 157157 | 528171 |
2003 | Dec | 206743 | 555495 |
2004 | Jan | 198818 | 583987 |
2004 | Feb | 183658 | 553876 |
2004 | Mar | 183273 | 601412 |
2004 | Apr | 170114 | 582970 |
2004 | May | 191604 | 594457 |
2004 | Jun | 238074 | 588792 |
2004 | Jul | 237670 | 614166 |
2004 | Aug | 215667 | 623107 |
2004 | Sep | 147508 | 585125 |
2004 | Oct | 193951 | 610037 |
2004 | Nov | 197560 | 584610 |
2004 | Dec | 254786 | 606731 |
2005 | Jan | 229809 | 594924 |
2005 | Feb | 184920 | 545332 |
2005 | Mar | 226883 | 617540 |
2005 | Apr | 169221 | 594492 |
2005 | May | 178327 | 614802 |
2005 | Jun | 236724 | 609195 |
2005 | Jul | 268988 | 627961 |
2005 | Aug | 240410 | 630904 |
2005 | Sep | 165541 | 574253 |
2005 | Oct | 186778 | 592712 |
2005 | Nov | 193399 | 566138 |
2005 | Dec | 256861 | 572343 |
And these are the codes for Python:
# To plot the line graph
# Create separate data frames for each year
years = All_Flights_Combined_Month['Year'].unique()
data_frames_month = [All_Flights_Combined_Month[All_Flights_Combined_Month['Year'] == year] for year in years]
# Create subplots
fig, ax = plt.subplots(figsize=(10, 8))
# Plot Delay_count for each year
for i, year in enumerate(years):
color = 'red' if str(year) == '2003' else 'green' if str(year) == '2004' else 'blue'
ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Delay_count'], label=f"{year} Delay Count", color=color)
# Plot Total_Count for each year
for i, year in enumerate(years):
color = 'orange' if str(year) == '2003' else 'yellow' if str(year) == '2004' else 'purple'
ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Total_Count'], label=f"{year} Total Count", color=color)
# Set title and labels
ax.set_title('Flight Count by Month')
ax.set_xlabel('Month')
ax.set_ylabel('Number of Flights')
# Add legend
ax.legend(title='Year')
# Save the plot as a pdf file
plt.savefig('Monthly Flight Comparison Python.pdf', format='pdf')
# Show the plot
plt.show()
While this is for R:
{r}
# To plot the line graph
month_plot <- ggplot() + geom_line(data= All_Flights_Combined_Month, aes(x =Month, y=Delay_count, group=Year, color=Year)) +
geom_line(data=All_Flights_Combined_Month, aes(x =Month, y=Total_count, group=Year, color=Year))+ scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
xlab("Months")+
ylab("Number of Flights")+
ggtitle("Flight Count by Month")
# To save the plot as .pdf
ggplot2::ggsave("Monthly Flight Comparison R.pdf", plot = last_plot(), width = 8, height = 6)
I need the legend and the line colors to match the ones on Python. I hope I have provide sufficient information. Please kindly advice thank you.
I tried adding scale_color_manual to each geom_line but it churned out an error stating that scale_color_manual values has already been used and it will overwrite the previous ones.
答案1
得分: 2
你可以将你的数据转换为长格式,然后使用 paste0
和 gsub
将年份、延误计数和总计数的长格式合并为一个字符串。要获得正确的颜色,你可以使用 scale_color_manual
,并使用 breaks
指定正确的顺序,如下所示:
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = Delay_count:Total_count) %>%
mutate(Year2 = paste0(Year, " ", gsub("_", " ", name)),
Month = factor(Month, levels = month.abb)) %>%
ggplot(aes(x = Month, y = value, color = Year2, group = Year2)) +
geom_line() +
labs(color = "Year", x = "Month", y = "Number of Flights") +
scale_color_manual(values = c("2003 Delay count" = "red",
"2004 Delay count" = "green",
"2005 Delay count" = "blue",
"2003 Total count" = "orange",
"2004 Total count" = "yellow",
"2005 Total count" = "purple"),
breaks = c("2003 Delay count",
"2004 Delay count",
"2005 Delay count",
"2003 Total count",
"2004 Total count",
"2005 Total count"))
创建于2023-02-19,使用 reprex v2.0.2
英文:
You could transform your data to a longer format and combine the Year and longer format of Delay count and Total count to one string using paste0
and gsub
. To get the right colors you could use scale_color_manual
, with right order using breaks
like this:
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = Delay_count:Total_count) %>%
mutate(Year2 = paste0(Year, " ", gsub("_", " ", name)),
Month = factor(Month, levels = month.abb)) %>%
ggplot(aes(x = Month, y = value, color = Year2, group = Year2)) +
geom_line() +
labs(color = "Year", x = "Month", y = "Number of Flights") +
scale_color_manual(values = c("2003 Delay count" = "red",
"2004 Delay count" = "green",
"2005 Delay count" = "blue",
"2003 Total count" = "orange",
"2004 Total count" = "yellow",
"2005 Total count" = "purple"),
breaks = c("2003 Delay count",
"2004 Delay count",
"2005 Delay count",
"2003 Total count",
"2004 Total count",
"2005 Total count"))
<!-- -->
<sup>Created on 2023-02-19 with reprex v2.0.2</sup>
答案2
得分: 2
这种类型的问题通常涉及到数据重塑。格式应该是长格式,而数据是宽格式。请参考[这篇帖子](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format)来了解如何将数据从宽格式转换为长格式。
然后将变量 `Year` 或 `name` 更改为它们之间的交互作用。这是颜色和分组变量。
clrs <- c("2003 Delay Count" = "#e44b3b", "2003 Total Count" = "#edbe70",
"2004 Delay Count" = "#0d720d", "2004 Total Count" = "#f8f867",
"2005 Delay Count" = "#0000cb", "2005 Total Count" = "#6d0469")
All_Flights_Combined_Month %>%
pivot_longer(ends_with("count")) %>%
mutate(Month = factor(Month, levels = month.abb),
Year = interaction(Year, name, sep = " "),
Year = sub("_c", " C", Year)) %>%
select(-name) %>%
ggplot(aes(Month, value, colour = Year, group = Year)) +
geom_line(linewidth = 1.25) +
scale_color_manual(values = clrs) +
theme_minimal()
英文:
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
Then change variable Year
or name
to the interaction between these two. That's the color and grouping variable.
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
library(ggplot2)
})
clrs <- c("2003 Delay Count" = "#e44b3b", "2003 Total Count" = "#edbe70",
"2004 Delay Count" = "#0d720d", "2004 Total Count" = "#f8f867",
"2005 Delay Count" = "#0000cb", "2005 Total Count" = "#6d0469")
All_Flights_Combined_Month %>%
pivot_longer(ends_with("count")) %>%
mutate(Month = factor(Month, levels = month.abb),
Year = interaction(Year, name, sep = " "),
Year = sub("_c", " C", Year)) %>%
select(-name) %>%
ggplot(aes(Month, value, colour = Year, group = Year)) +
geom_line(linewidth = 1.25) +
scale_color_manual(values = clrs) +
theme_minimal()
<!-- -->
<sup>Created on 2023-02-19 with reprex v2.0.2</sup>
Data
x <- "Year Month Delay_count Total_count
2003 Jan 151238 552109
2003 Feb 158369 500206
2003 Mar 152156 559342
2003 Apr 125699 527303
2003 May 136551 533782
2003 Jun 163497 536496
2003 Jul 183491 558568
2003 Aug 178979 556984
2003 Sep 113916 527714
2003 Oct 131409 552370
2003 Nov 157157 528171
2003 Dec 206743 555495
2004 Jan 198818 583987
2004 Feb 183658 553876
2004 Mar 183273 601412
2004 Apr 170114 582970
2004 May 191604 594457
2004 Jun 238074 588792
2004 Jul 237670 614166
2004 Aug 215667 623107
2004 Sep 147508 585125
2004 Oct 193951 610037
2004 Nov 197560 584610
2004 Dec 254786 606731
2005 Jan 229809 594924
2005 Feb 184920 545332
2005 Mar 226883 617540
2005 Apr 169221 594492
2005 May 178327 614802
2005 Jun 236724 609195
2005 Jul 268988 627961
2005 Aug 240410 630904
2005 Sep 165541 574253
2005 Oct 186778 592712
2005 Nov 193399 566138
2005 Dec 256861 572343"
All_Flights_Combined_Month <- read.table(text = x, header = TRUE)
<sup>Created on 2023-02-19 with reprex v2.0.2</sup>
答案3
得分: 2
以下是代码的翻译部分:
library(tidyverse)
df %>%
pivot_longer(-c(Year, Month)) %>%
mutate(Year = paste(Year, name)) %>%
ggplot(aes(x = Month, y = value, color = factor(Year))) +
geom_line(aes(group = Year)) +
scale_x_discrete(limits = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) +
scale_color_manual(values = c("purple", "yellow", "gold", "blue", "green", "red")) +
xlab("Months") +
ylab("Number of Flights") +
ggtitle("Flight Count by Month") +
theme_classic()
英文:
Something like this:
library(tidyverse)
df %>%
pivot_longer(-c(Year, Month)) %>%
mutate(Year = paste(Year, name)) %>%
ggplot(aes(x =Month, y=value, color=factor(Year)))+
geom_line(aes(group = Year))+
scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
scale_color_manual(values = c("purple", "yellow", "gold", "blue", "green", "red"))+
xlab("Months")+
ylab("Number of Flights")+
ggtitle("Flight Count by Month")+
theme_classic()
答案4
得分: 1
使用基本的R语言功能。首先,将数据重塑为宽格式,然后使用matplot
函数自定义axis
和mtext
。
dat_w <- reshape(dat, idvar='Month', timevar='Year', direction='w')
par(mar=c(5, 6, 4, 2))
matplot(dat_w[, -1], type='l', lty=1, col=2:8, axes=FALSE, ylab='', main='Flight Count By Month')
axis(side=1, at=1:12, labels=dat_w$Month, cex.axis=0.8)
axis(2, axTicks(2), formatC(axTicks(2), format='f', digits=0), las=2, cex.axis=0.8)
mtext('Month', side=1, line=2.5, cex=0.8)
mtext('Number of Flights', side=2, line=4, cex=0.8)
legend('right', c(paste(unique(dat$Year), rep(gsub('_', ' ', names(dat)[3:4]), each=3))),
col=2:8, lty=1, title='Year', cex=0.7)
box()
请注意,我已经保留了代码部分的原文,只翻译了注释和函数参数的内容。
英文:
Using just base R. First, reshape
into wide format, then use matplot
and customize axis
and mtext
a little.
dat_w <- reshape(dat, idvar='Month', timevar='Year', direction='w')
par(mar=c(5, 6, 4, 2))
matplot(dat_w[, -1], type='l', lty=1, col=2:8, axes=FALSE, ylab='', main='Flight Count By Month')
axis(side=1, at=1:12, labels=dat_w$Month, cex.axis=.8)
axis(2, axTicks(2), formatC(axTicks(2), format='f', digits=0), las=2, cex.axis=.8)
mtext('Month', side=1, line=2.5, cex=.8); mtext('Number of Flights', 2, 4, cex=.8)
legend('right', c(paste(unique(dat$Year), rep(gsub('_', ' ', names(dat)[3:4]), each=3))),
col=2:8, lty=1, title='Year', cex=.7)
box()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论