英文:
Python to add data label on linechart from Matplotlib and Pandas GroupBy
问题
以下是您的代码的翻译部分:
# 导入需要的库
import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO
# 创建包含数据的CSV字符串
csvfile = StringIO(
"""
Name Year - Month Score
Mike 2022-09 192
Mike 2022-08 708
Mike 2022-07 140
Mike 2022-05 144
Mike 2022-04 60
Mike 2022-03 108
Kate 2022-07 19850
Kate 2022-06 19105
Kate 2022-05 23740
Kate 2022-04 19780
Kate 2022-03 15495
Peter 2022-08 51
Peter 2022-07 39
Peter 2022-06 49
Peter 2022-05 49
Peter 2022-04 79
Peter 2022-03 13
Lily 2022-11 2
David 2022-11 3
David 2022-10 6
David 2022-08 2"""
)
# 从CSV读取数据
df = pd.read_csv(csvfile, sep='\t', engine='python')
# 按 "Name" 列分组
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
# 对子数据框按 "Year - Month" 列排序
sub_frame_sorted = sub_frame.sort_values('Year - Month')
# 创建线图
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
# 添加数据标签
labels = sub_frame_sorted['Score']
for i, label in enumerate(labels):
line_chart.annotate(label, (sub_frame_sorted['Year - Month'].iloc[i], sub_frame_sorted['Score'].iloc[i]), ha='center')
# 显示图表
plt.show()
数据标签的错误已被修复。在循环中,我们使用了enumerate
函数来遍历数据标签,以确保它们正确地与数据点匹配。
英文:
I am hoping to add data labels to a line chart produced by Matplotlib from Pandas GroupBy.
import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO
csvfile = StringIO(
"""
Name Year - Month Score
Mike 2022-09 192
Mike 2022-08 708
Mike 2022-07 140
Mike 2022-05 144
Mike 2022-04 60
Mike 2022-03 108
Kate 2022-07 19850
Kate 2022-06 19105
Kate 2022-05 23740
Kate 2022-04 19780
Kate 2022-03 15495
Peter 2022-08 51
Peter 2022-07 39
Peter 2022-06 49
Peter 2022-05 49
Peter 2022-04 79
Peter 2022-03 13
Lily 2022-11 2
David 2022-11 3
David 2022-10 6
David 2022-08 2""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # sort the data-frame by a column
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
label = sub_frame_sorted['Score']
line_chart.annotate(label, (sub_frame_sorted['Year - Month'], sub_frame_sorted['Score']), ha='center')
plt.show()
The 2 lines for data labels throw an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I have them corrected?
答案1
得分: 1
问题应该出现在你的for循环内部。
你可以用这段代码替换你的代码:
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month')
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
for x, y in zip(sub_frame_sorted["Year - Month"], sub_frame_sorted["Score"]):
label = "{:.0f}".format(y) # 将标签格式化为字符串
line_chart.annotate(label, (x, y), textcoords="offset points", xytext=(0, 10), ha='center')
如果你遇到关于'Year-Month'的错误,你应该使用to_datetime()
方法进行转换。
请告诉我这是否有帮助。谢谢。
英文:
So, the problem should be inside your for loop.
You can replace your code by this one:
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month')
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
for x, y in zip(sub_frame_sorted["Year - Month"], sub_frame_sorted["Score"]):
label = "{:.0f}".format(y) # format the label as a string
line_chart.annotate(label, (x, y), textcoords="offset points", xytext=(0,10), ha='center')
And also if you face error regarding 'Year-Month' you should convert that using to_datetime() method.
Please let me know if this helps. Thanks.
答案2
得分: 1
代码中提到错误是在annotate()
函数中。sub_frame_sorted
是一个数据框,你需要使用for
循环来获取其中的每个项目,然后再使用annotate
。此外,x轴是年份-月份,被视为字符串,可能会出现问题。因此,你只需要使用索引。我使用了0、1、2...来表示索引,使用i
。这应该可以工作...如果你认为文本与线重叠,可以添加一个小的偏移量。
希望这是你要找的内容。
更新的代码
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # 按列对数据框进行排序
line_chart = sub_frame_sorted.plot("Year - Month", "Score", legend=False)
i = 0
for ix, vl in sub_frame_sorted.iterrows():
line_chart.annotate(vl['Score'], (i, vl['Score']), ha='center')
i = i + 1
plt.show()
输出图表
英文:
As the error says that the problem is in the annotate()
. The sub_frame_sorted
is a dataframe and you need to use a for
loop to get each of the items within it before using annotate. Also, the x-axis is year-month, which is seen as string and you will run into issues. So, you need to just use index. I have used 0,1,2... using i
. This should work... you can add a small offset if you think the text is overlapping a line
Hope this is what you are looking for.
Updated code
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # sort the data-frame by a column
line_chart = sub_frame_sorted.plot("Year - Month", "Score", legend=False)
i=0
for ix, vl in sub_frame_sorted.iterrows():
line_chart.annotate(vl['Score'], (i, vl['Score']), ha='center')
i=i+1
plt.show()
Output plots
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论