英文:
Meaningful plotting for the graph with many countries
问题
以下是翻译好的部分:
我的CSV文件如下。完整的文件在这里。
示例行:
Country,create_date,SalesIndex
Austrailia,2023-07-16,0.66
Macedonia,2023-07-17,0.48
UK,2023-07-18,0.2
Newzealand,2023-07-19,0.50000000000000011
India,2023-07-15,7.89
Macedonia,2023-07-19,1.5800000000000003
Indonesia,2023-07-19,45.709999999999987
India,2023-07-19,7.91
Portugal,2023-07-22,226.17999999999986
我的生成图表的代码如下:
import pandas as pd
import matplotlib.pyplot as plt
# Load the CSV file into a pandas DataFrame
df = pd.read_csv('sales.csv')
# Convert the 'create_date' column to datetime type
df['create_date'] = pd.to_datetime(df['create_date'])
# Create a plot for each unique user_email
fig, ax = plt.subplots()
for Country, data in df.groupby('Country'):
data.plot(x='create_date', y='SalesIndex', ax=ax, label=Country)
# Set the title and labels for the plot
plt.title('Total Data Billed per Country')
plt.xlabel('Date')
plt.ylabel('Total Data Billed (GB)')
# Display the legend
plt.legend()
# Show the plot
plt.show()
生成的图表如下。但是有没有更好的方法生成图表,因为在下面的图表中,我无法确定哪个线条代表哪个国家?而且对于数据几乎相似的国家,线条几乎重叠?
英文:
My csv file is as below. The complete file is here
sample rows
Country,create_date,SalesIndex
Austrailia,2023-07-16,0.66
Macedonia,2023-07-17,0.48
UK,2023-07-18,0.2
Newzealand,2023-07-19,0.50000000000000011
India,2023-07-15,7.89
Macedonia,2023-07-19,1.5800000000000003
Indonesia,2023-07-19,45.709999999999987
India,2023-07-19,7.91
Portugal,2023-07-22,226.17999999999986
My code to generate the graph is as follows
import pandas as pd
import matplotlib.pyplot as plt
# Load the CSV file into a pandas DataFrame
df = pd.read_csv('sales.csv')
# Convert the 'create_date' column to datetime type
df['create_date'] = pd.to_datetime(df['create_date'])
# Create a plot for each unique user_email
fig, ax = plt.subplots()
for Country, data in df.groupby('Country'):
data.plot(x='create_date', y='SalesIndex', ax=ax, label=Country)
# Set the title and labels for the plot
plt.title('Total Data Billed per Country')
plt.xlabel('Date')
plt.ylabel('Total Data Billed (GB)')
# Display the legend
plt.legend()
# Show the plot
plt.show()
The graph is generated as below. However Is there any better way to generate the grpah as the below I am not able to find which is for which country? Also for the coutries with almots similar data the line almost overlaps?
答案1
得分: 1
迄今为止,我们只需通过添加标记或不同的颜色来使其更容易理解。
我尝试了一下,通过添加标记和网格来改进它。这可能会有所帮助。还附上了我的输出截图。
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sales.csv')
df['create_date'] = pd.to_datetime(df['create_date'])
fig, ax = plt.subplots(figsize=(14, 7))
color_palette = plt.cm.get_cmap('tab20', len(df['Country'].unique()))
for i, (country, data) in enumerate(df.groupby('Country')):
ax.plot(data['create_date'], data['SalesIndex'], label=country, color=color_palette(i))
ax.scatter(data['create_date'], data['SalesIndex'], color=color_palette(i), s=40)
plt.title('Total Billed per Country')
plt.xlabel('Date')
plt.ylabel('Total Data Billed (GB)')
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
英文:
So far, we can just make it more understandable by adding markers or different colors.
I did a small try by adding markers and a grid. It may help you. Also attached screenshot of my output.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sales.csv')
df['create_date'] = pd.to_datetime(df['create_date'])
fig, ax = plt.subplots(figsize=(14, 7))
color_palette = plt.cm.get_cmap('tab20', len(df['Country'].unique()))
for i, (country, data) in enumerate(df.groupby('Country')):
ax.plot(data['create_date'], data['SalesIndex'], label=country, color=color_palette(i))
ax.scatter(data['create_date'], data['SalesIndex'], color=color_palette(i), s=40)
plt.title('Total Billed per Country')
plt.xlabel('Date')
plt.ylabel('Total Data Billed (GB)')
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论