英文:
Frequency plot using dots instead of bars?
问题
I'm trying to create the chart in this question, using this answer. I'm open to any solution that works.
Visual borrowed from the original question:
Difference from that question is I've already calculated my bins and frequency values so I don't use numpy
or matplotlib
to do so.
Here's my sample data, I refer to it as df_fd
in my sample code below:
low_bin high_bin frequency
0 13.142857 18.857143 3
1 18.857143 24.571429 5
2 24.571429 30.285714 8
3 30.285714 36.000000 8
4 36.000000 41.714286 7
5 41.714286 47.428571 7
6 47.428571 53.142857 1
7 53.142857 58.857143 1
Based on the cited question, here's my code (df_fd
is the DataFrame
above):
fig, ax = plt.subplots()
ax.bar(df_fd.low_bin, df_fd.frequency, width= df_fd.high_bin-df_fd.low_bin)
X,Y = np.meshgrid(bins, df_fd['frequency'])
Y = Y.astype(np.float)
Y[Y>df_fd['frequency']] = np.nan
plt.scatter(X,Y)
This Y[Y>df_fd['frequency']] = np.nan
statement is what fails, and I don't know how to get around it. I understand what it's trying to do, and the best guess I have is somehow mapping the matrix index to the DataFrame index would help, but I'm not sure how to do that.
Thank you for helping me!
英文:
I'm trying to create the chart in this question, using this answer. I'm open to any solution that works.
Visual borrowed from original question:
Difference from that question is I've already calculated my bins and frequency values so I don't use numpy
or matplotlib
to do so.
Here's my sample data, I refer to it as df_fd
in my sample code below:
low_bin high_bin frequency
0 13.142857 18.857143 3
1 18.857143 24.571429 5
2 24.571429 30.285714 8
3 30.285714 36.000000 8
4 36.000000 41.714286 7
5 41.714286 47.428571 7
6 47.428571 53.142857 1
7 53.142857 58.857143 1
Based off the cited question here's my code (df_fd
is the DataFrame
above):
fig, ax = plt.subplots()
ax.bar(df_fd.low_bin, df_fd.frequency, width= df_fd.high_bin-df_fd.low_bin)
X,Y = np.meshgrid(bins, df_fd['frequency'])
Y = Y.astype(np.float)
Y[Y>df_fd['frequency']] = np.nan
plt.scatter(X,Y)
This Y[Y>df_fd['frequency']] = np.nan
statement is what fails and I don't know how to get around it. I understand what it's trying to do and the best guess I have is somehow mapping the matrix index to the DataFrame index would help, but I'm not sure how to do that.
Thank you for helping me!
答案1
得分: 2
使用散点图的一种巧妙解决方案:
(df.assign(bin=np.mean([df['low_bin'], df['high_bin']], axis=0))
.loc[lambda d: d.index.repeat(tmp['frequency'])]
.assign(Y=lambda d: d.groupby(level=0).cumcount())
.plot.scatter(x='bin', y='Y', s=600)
)
它的工作原理是获取低/高的平均值作为X值,然后将行重复多次,次数等于“frequency”的值,并使用groupby.cumcount
递增计数。
输出:
英文:
One hacky solution using a scatter plot:
(df.assign(bin=np.mean([df['low_bin'], df['high_bin']], axis=0))
.loc[lambda d: d.index.repeat(tmp['frequency'])]
.assign(Y=lambda d: d.groupby(level=0).cumcount())
.plot.scatter(x='bin', y='Y', s=600)
)
It works by getting the average of low/high as X value, then repeating the rows as many times as the "frequency" value, and incrementing the count with a groupby.cumcount
.
Output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论