英文:
Why do these plots with same parameters look so different? logging in matplotlib vs seaborn
问题
使用seaborn.histplot:
import seaborn as sns
plot = sns.histplot(data=adata.obs, x='n_counts', bins=50, log_scale=True)
plot.set_xlim(1, 100000)
使用plt.hist:
adata = org_1
data = adata.obs['n_counts']
plt.hist(data, bins=50, range=(1, 100000))
plt.xscale("log")
使用plt.hist,但在将数据传递给绘图函数之前对数据进行对数变换:
顺便说一下 - 我如何让x轴以10^n表示法显示?(就像第一个图中)
data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50)
plt.xlabel('log nUMI')
使用plt.hist,但在将数据传递给绘图函数之前对数据进行对数变换,并指定范围与图1和图2相同:
data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50, range=(1, 10000))
plt.xlabel('log nUMI')
英文:
With seaborn.histplot:
import seaborn as sns
plot = sns.histplot(data = adata.obs, x = 'n_counts', bins=50, log_scale=True)
plot.set_xlim(1, 100000)
With plt.hist
adata = org_1
data = adata.obs['n_counts']
plt.hist(data, bins=50, range=(1, 100000))
plt.xscale("log")
With plt.hist, but logging the data before passing it to plotting function:
Tangent - how can I get the x axis to be in 10^n notation? (as in first plot)
data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50)
plt.xlabel('log nUMI')
With plt.hist, but logging the data before passing it to plotting function, but specifying range to be as in plots 1 and 2:
data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50, range = (1, 10000))
plt.xlabel('log nUMI')
答案1
得分: 1
(I don't have enough reputation to comment, but I would like to assist)
> 使用 plt.hist
这里有确切的 50 个桶!每个桶有 2,000 个单位。 (请看第一个桶是如何在 2*10^3 结束的) 因为你已经在对数刻度上绘制它们,所以左侧的桶看起来宽得不自然,这是对数轴的一个特效。
> data = np.log10(adata.obs['n_counts'])
在我使用的大多数(如果不是所有)绘图工具中,一般规则是将 x 轴的对数不适用于获得所期望的变量刻度。如果你想要变量刻度,不要在绘图中取对数,让绘图工具处理。
> plt.hist(data, bins=50, range=(1, 10000))
一旦你取了对数,那么范围必须重新缩放。你需要从 log(1)
到 log(10000)
。在取对数之后,没有数据点的值是 10000
(这意味着你一开始有 10000
位数的数字)。
英文:
(I don't have enough reputation to comment, but I would like to assist)
> With plt.hist
Here you have exactly 50 buckets! They are each 2_000 units. (see how the first bucket ends at 2*10^3) Because you have plotted them on a logscale, the buckets to the left appear unnaturally wide which is an artifact of log axes.
> data = np.log10(adata.obs['n_counts'])
As a general rule in most (if not all) plotting utilities that I have used, taking the log of the x-axis does not lend well to getting the variable tick spacing you desire. If you want the variable tick spacing, don't take the log for the plot, let the plot do it.
> plt.hist(data, bins=50, range = (1, 10000))
Once you take the log, then the range must be scaled back. You need to go from log(1)
to log(10000)
. There are no data points whose value after the log is 10000
(this would imply you had 10000
digit numbers to begin with)
答案2
得分: 1
你的图形没有相同的参数。
当你传递数据并要求使用50个箱子时,plt.hist
无法知道你后来会改变轴的刻度,因此它会计算50个线性间隔的间断点。
因为你在sns.histplot
中传递了log_scale=True
,它在计算箱子间断点时知道了刻度,可以使它们均匀分布在对数间隔中。
(你也可以在调用sns.histplot
之前将轴刻度设置为对数,而不传递log_scale=True
,但plt.hist
不会按照这种方式工作)。
当你首先对数据进行对数处理时,函数无法知道这些数字代表对数值。因此,你会得到外观均匀分布的箱子(因为现在matplotlib所做的一切都是在线性刻度上进行的),并且很好地表示对数正态分布,但这些箱子不再对应原始数据范围,你需要手动更改任何刻度标签以表示原始数量级。
英文:
Your plots do not have the same parameters.
When you pass plt.hist
some data and ask for 50 bins, it has no way of knowing that you are later going to change the axis scale, so it computes 50 linearly-spaced breaks.
Because you passed log_scale=True
to sns.histplot
, it knows the scale at the time that it computes the bin breaks, and it can make them evenly spaced in log intervals.
(You could also set the axis scale to log before calling sns.histplot
and without passing log_scale=True
, but plt.hist
does not work this way).
When you log the data first, there is no way for the function to know that the numbers represent log values. So you do get bins that appear evenly spaced (because everything matplotlib does happens on a linear scale now) and represent a lognormal distribution well, but those bins no longer correspond to the range covered by the original data, and you would need to manually change any tick labels to represent the original magnitudes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论