为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

huangapple go评论62阅读模式
英文:

Why do these plots with same parameters look so different? logging in matplotlib vs seaborn

问题

使用seaborn.histplot:

import seaborn as sns

plot = sns.histplot(data=adata.obs, x='n_counts', bins=50, log_scale=True)
plot.set_xlim(1, 100000)

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

使用plt.hist:

adata = org_1
data = adata.obs['n_counts']
plt.hist(data, bins=50, range=(1, 100000))
plt.xscale("log")

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

使用plt.hist,但在将数据传递给绘图函数之前对数据进行对数变换:
顺便说一下 - 我如何让x轴以10^n表示法显示?(就像第一个图中)

data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50)
plt.xlabel('log nUMI')

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

使用plt.hist,但在将数据传递给绘图函数之前对数据进行对数变换,并指定范围与图1和图2相同:

data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50, range=(1, 10000))
plt.xlabel('log nUMI')

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

英文:

With seaborn.histplot:

import seaborn as sns

plot = sns.histplot(data = adata.obs, x = 'n_counts',  bins=50, log_scale=True)
plot.set_xlim(1, 100000)

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

With plt.hist

adata = org_1
data = adata.obs['n_counts']
plt.hist(data, bins=50, range=(1, 100000))
plt.xscale("log")

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

With plt.hist, but logging the data before passing it to plotting function:
Tangent - how can I get the x axis to be in 10^n notation? (as in first plot)

data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50)
plt.xlabel('log nUMI')

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

With plt.hist, but logging the data before passing it to plotting function, but specifying range to be as in plots 1 and 2:

data = np.log10(adata.obs['n_counts'])
plt.hist(data, bins=50, range = (1, 10000))
plt.xlabel('log nUMI')

为什么相同参数的这些图在matplotlib与seaborn中看起来如此不同?

答案1

得分: 1

(I don't have enough reputation to comment, but I would like to assist)

> 使用 plt.hist

这里有确切的 50 个桶!每个桶有 2,000 个单位。 (请看第一个桶是如何在 2*10^3 结束的) 因为你已经在对数刻度上绘制它们,所以左侧的桶看起来宽得不自然,这是对数轴的一个特效。

> data = np.log10(adata.obs['n_counts'])

在我使用的大多数(如果不是所有)绘图工具中,一般规则是将 x 轴的对数不适用于获得所期望的变量刻度。如果你想要变量刻度,不要在绘图中取对数,让绘图工具处理。

> plt.hist(data, bins=50, range=(1, 10000))

一旦你取了对数,那么范围必须重新缩放。你需要从 log(1)log(10000)。在取对数之后,没有数据点的值是 10000(这意味着你一开始有 10000 位数的数字)。

英文:

(I don't have enough reputation to comment, but I would like to assist)

> With plt.hist

Here you have exactly 50 buckets! They are each 2_000 units. (see how the first bucket ends at 2*10^3) Because you have plotted them on a logscale, the buckets to the left appear unnaturally wide which is an artifact of log axes.

> data = np.log10(adata.obs['n_counts'])

As a general rule in most (if not all) plotting utilities that I have used, taking the log of the x-axis does not lend well to getting the variable tick spacing you desire. If you want the variable tick spacing, don't take the log for the plot, let the plot do it.

> plt.hist(data, bins=50, range = (1, 10000))

Once you take the log, then the range must be scaled back. You need to go from log(1) to log(10000). There are no data points whose value after the log is 10000 (this would imply you had 10000 digit numbers to begin with)

答案2

得分: 1

你的图形没有相同的参数。

当你传递数据并要求使用50个箱子时,plt.hist 无法知道你后来会改变轴的刻度,因此它会计算50个线性间隔的间断点。

因为你在sns.histplot中传递了log_scale=True,它在计算箱子间断点时知道了刻度,可以使它们均匀分布在对数间隔中。

(你也可以在调用sns.histplot之前将轴刻度设置为对数,而不传递log_scale=True,但plt.hist 不会按照这种方式工作)。

当你首先对数据进行对数处理时,函数无法知道这些数字代表对数值。因此,你会得到外观均匀分布的箱子(因为现在matplotlib所做的一切都是在线性刻度上进行的),并且很好地表示对数正态分布,但这些箱子不再对应原始数据范围,你需要手动更改任何刻度标签以表示原始数量级。

英文:

Your plots do not have the same parameters.

When you pass plt.hist some data and ask for 50 bins, it has no way of knowing that you are later going to change the axis scale, so it computes 50 linearly-spaced breaks.

Because you passed log_scale=True to sns.histplot, it knows the scale at the time that it computes the bin breaks, and it can make them evenly spaced in log intervals.

(You could also set the axis scale to log before calling sns.histplot and without passing log_scale=True, but plt.hist does not work this way).

When you log the data first, there is no way for the function to know that the numbers represent log values. So you do get bins that appear evenly spaced (because everything matplotlib does happens on a linear scale now) and represent a lognormal distribution well, but those bins no longer correspond to the range covered by the original data, and you would need to manually change any tick labels to represent the original magnitudes.

huangapple
  • 本文由 发表于 2023年2月8日 11:23:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381105.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定