英文:
Plot year by year in the same plot (plotly)
问题
我尝试绘制了具有相同起始和结束日期-月份的多年时间序列。
例如,我需要的是来自2018年、2019年等的数据,以便在同一图中比较不同年份的数据。
我所编写的代码是在子图中逐年绘制,但我想要一个单独的Plotly图。
from datetime import date, datetime
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# 请注意,这里省略了一些导入和数据加载的部分
fig = go.Figure()
# 下面是每年的起始日期和结束日期
start = ["2020-01-01", "2018-01-04", "2021-01-05", "2022-01-05", "2019-01-01", "2023-01-10", "2017-12-27"]
end = ["2020-12-31", "2018-12-27", "2021-12-16", "2022-12-31", "2019-12-22", "2023-05-03", "2017-01-26"]
years = df.index.year.unique()[df.index.year.unique() > 2016].sort_values()
for idx, (s, e) in enumerate(zip(start, end)):
tmp = df[(df.index >= start[idx]) & (df.index <= end[idx])]
fig.add_trace(go.Scatter(x=tmp.index,
y=tmp,
name=str(years[idx]),
mode='lines',
))
fig.update_layout(height=600, xaxis_tickformat='%d-%m')
fig.update_xaxes(type='date')
fig.show()
英文:
I tried plotting time series with many years in the same start and end day-month.
For example, what I need is data from 01/01 for 2018, 2019, etc in the same plot in order to compare different data from different years.
The code that I do, plot year by year in subplot, but I would like a single plotly plot.
Link of data to download and can run the script
from datetime import date, datetime
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
df_decremento_municipio = pd.read_csv('app/data/decremento_municipio_202305291512.csv', index_col="view_date")
template_graph = {
"layout": {
"modebar": {
"remove": [
"zoom",
"pan",
"select",
"zoomIn",
"zoomOut",
"lasso2d",
"autoscale",
]
},
"separators": ".",
"showlegend": True,
}
}
df_decremento_municipio.index = pd.to_datetime(df_decremento_municipio.index)
df_decremento_municipio["year"] = df_decremento_municipio.index.year
min_date = df_decremento_municipio.index.date.min()
max_date = df_decremento_municipio.index.date.max()
df=pd.Series(name="area_ha", dtype="float64")
for ano in df_decremento_municipio.index.year.unique()[
df_decremento_municipio.index.year.unique() > 2016
]:
# Remover os anos de 2015 e 2016 (dados muito ruins)
df_por_ano = df_decremento_municipio[df_decremento_municipio["year"] == ano]
dff_acumulacao = df_por_ano["area_ha"].groupby([df_por_ano.index]).sum().cumsum()
df = df.append(dff_acumulacao)
fig = go.Figure()
# Primeira data de cada dado no ano
start = ["2020-01-01", "2018-01-04", "2021-01-05" , "2022-01-05", "2019-01-01", "2023-01-10", "2017-12-27"]
# Ultima data de cada dado no ano
end = ["2020-12-31", "2018-12-27", "2021-12-16", "2022-12-31", "2019-12-22", "2023-05-03", "2017-01-26"]
years = df.index.year.unique()[df.index.year.unique()>2016].sort_values()
for idx, (s,e) in enumerate(zip(start, end)):
tmp = df[(df.index >= start[idx]) & (df.index <= end[idx])]
fig.add_trace(go.Scatter(x=tmp.index,
y=tmp,
name=str(years[idx]),
mode='lines',
))
fig.update_layout(height=600, xaxis_tickformat='%d-%m')
fig.update_xaxes(type='date')
fig.show()
答案1
得分: 0
由于您的x轴包含关于年份的信息,因此绘图中的线将是不连续的(因为每条线都将从上一条线结束的地方开始)。使用xaxis_tickformat='%d-%m'
隐藏年份信息在这种情况下不会有帮助,它只会改变视觉效果。
要比较跨年份的时间序列,您可以简单地从图中删除年份信息。在您的add_trace
中,将x轴值更改为仅包含月份和日期,使用strftime('%m-%d')
:
fig.add_trace(go.Scatter(x=tmp.index.strftime('%m-%d'),
y=tmp['dist'],
name=str(years[idx]),
mode='lines',
))
请注意,您还需要删除fig.update_xaxes(type='date')
行,因为数据类型现在是字符串。
英文:
Since your x-axis contains the information about the year, the lines in your plot will be discontinuous (as each line will start where the previous one left off). To hide the year information using xaxis_tickformat='%d-%m'
will not help in this case, it will only change the visuals.
What you can do to compare the time series across years is to simply remove the information about the year from the plot. In your add_trace
, change the x-axis values to be only the month and day using strftime('%m-%d')
:
fig.add_trace(go.Scatter(x=tmp.index.strftime('%m-%d'),
y=tmp['dist'],
name=str(years[idx]),
mode='lines',
))
Note that you will also need to remove the fig.update_xaxes(type='date')
line as the data type is now strings.
答案2
得分: 0
我已经完成的工作,我相信这比在 https://stackoverflow.com/questions/69013744/a-simple-way-to-plot-day-and-month-only-on-the-x-axis-to-compare-years 中的建议更合理。
我创建了一个时间间隔列并将其放在 x 轴上。
Plotly 不能很好地处理列类型为 "timelapse" 的列。但我认为这是我能做的更好选择。
import pandas as pd
import plotly.express as px
df_decremento_municipio = pd.read_csv('app/graficos_dev/data/decremento_municipio_202305291512.csv', index_col="view_date")
template_graph = {
"layout": {
"modebar": {
"remove": [
"zoom",
"pan",
"select",
"zoomIn",
"zoomOut",
"lasso2d",
"autoscale",
]
},
"separators": ".",
"showlegend": True,
}
}
df_decremento_municipio.index = pd.to_datetime(df_decremento_municipio.index)
df_decremento_municipio = df_decremento_municipio[["area_ha"]].groupby(df_decremento_municipio.index).sum()
t1 = pd.DataFrame()
for ano in df_decremento_municipio.index.year.unique()[
df_decremento_municipio.index.year.unique() > 2017
]:
df = df_decremento_municipio[df_decremento_municipio.index.year == ano]
df["year"] = df.index.year
df["timedelta"] = (df.index - (df.index.year.astype("str") + "-01-01").astype("datetime64[ns]"))/1000000
df["cumsum"]= df["area_ha"].cumsum()
t1 = pd.concat([t1, df[["year", "timedelta", "cumsum"]]])
fig = px.line(t1, x="timedelta", y="cumsum", color='year')
fig.update_layout(title="Desflorestamento por Tempo",
xaxis={"title": "Data"},
yaxis={"title": "Área (ha)"},
xaxis_tickformat = '%d-%m'
)
fig.update_xaxes(type='date')
fig.show()
英文:
What I've done, and I belive that is more rational than the suggestion in answer gone in https://stackoverflow.com/questions/69013744/a-simple-way-to-plot-day-and-month-only-on-the-x-axis-to-compare-years
I've created a time lapse column and put it in x axes.
Plotly doesn't work well with column with type of column as "timelapse". But, I think it is better that I can do.
import pandas as pd
import plotly.express as px
df_decremento_municipio = pd.read_csv('app/graficos_dev/data/decremento_municipio_202305291512.csv', index_col="view_date")
template_graph = {
"layout": {
"modebar": {
"remove": [
"zoom",
"pan",
"select",
"zoomIn",
"zoomOut",
"lasso2d",
"autoscale",
]
},
"separators": ".",
"showlegend": True,
}
}
df_decremento_municipio.index = pd.to_datetime(df_decremento_municipio.index)
df_decremento_municipio = df_decremento_municipio[["area_ha"]].groupby(df_decremento_municipio.index).sum()
t1 = pd.DataFrame()
for ano in df_decremento_municipio.index.year.unique()[
df_decremento_municipio.index.year.unique() > 2017
]:
df = df_decremento_municipio[df_decremento_municipio.index.year == ano]
df["year"] = df.index.year
df["timedelta"] = (df.index - (df.index.year.astype("str") + "-01-01").astype("datetime64[ns]"))/1000000
df["cumsum"]= df["area_ha"].cumsum()
t1 = pd.concat([t1, df[["year", "timedelta", "cumsum"]]])
fig = px.line(t1, x="timedelta", y="cumsum", color='year')
fig.update_layout(title="Desflorestamento por Tempo",
xaxis={"title": "Data"},
yaxis={"title": "Área (ha)"},
xaxis_tickformat = '%d-%m'
)
fig.update_xaxes(type='date')
fig.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论