2023年5月22日 20:53:32go评论92阅读模式

英文:

How to create a 2D density plot color coded with the fraction of points that are in two dataframes

问题

我有两个2D数据框，定义在(x, y)平面上，df1和df2，并且它们位于x，y平面的相似区域。
我想创建一个以df2中的点数除以df1中的点数着色的图。
我尝试了一些方法，但看起来有点像素化，你有什么建议吗？

英文:

I have two 2D dataframes, defined in the (x,y) plane, df1 and df2 and they are located in a similar region of the x,y plane.

I would like to create a plot color coded with number of points in df2 divided by the number of points in df1.

I try something, but it looks pixelated, do you have any recommendation?

df1 = pd.DataFrame(np.random.random((10000,2)), columns=list(&#39;AB&#39;))
df2 = pd.DataFrame(np.random.random((10000,2)), columns=list(&#39;AB&#39;))
df1
upper_lim_A = 0.8
lower_lim_A = 0.3
upper_lim_B = 0.7
lower_lim_B = 0.2
fontsize = 8
N_bins = 10
hist_threshold = 1 #Number of points in a bin to be considered populated
%matplotlib inline
#Creating the histograms
hist_all, *edges_all = np.histogram2d(df1[&#39;B&#39;], df1[&#39;A&#39;], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
hist_vars, *edges_vars = np.histogram2d(df2[&#39;B&#39;], df2[&#39;A&#39;], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
fig, ax1 = plt.subplots( figsize=(10.5,2.5*3))
ax1.tick_params(direction=&#39;out&#39;, length=6, width=2, colors=&#39;k&#39;, grid_alpha=1, labelsize=size_la)
#Finding the ratio df2/df1
ratio = hist_vars/hist_all
#Replace Nan with 0
ratio[np.isnan(ratio)] = 0
#Deep copy the ratio array
ratio_copy = np.copy(ratio)
#Set the ratio to nan if hist_all &lt; hist_threshold
ratio[hist_all &lt; hist_threshold*0.9] = np.nan
color_style = &#39;inferno&#39;
vmax = 1.1
im = ax1.imshow(ratio, alpha=1,origin=&#39;lower&#39;,cmap=color_style,vmin=0, aspect=&#39;auto&#39;, extent=[lower_lim_A,upper_lim_A,lower_lim_B, upper_lim_B])
from mpl_toolkits.axes_grid1 import make_axes_locatable
ax0 = fig.add_subplot(111)
ax0.set_frame_on(False)
ax0.xaxis.set_ticks([])
ax0.yaxis.set_ticks([])
# create a divider for the colorbar
divider = make_axes_locatable(ax0)
cax = divider.append_axes(&quot;top&quot;, size=&quot;5%&quot;, pad=0.1)
# add the colorbar to the new axis
cbar = fig.colorbar(im, cax=cax, orientation=&#39;horizontal&#39;)
cbar.ax.tick_params(labelsize=size_la)
cbar.set_label(r&#39;$\mathrm{N}_{\mathrm{df2}}/\mathrm{N}_{\mathrm{df1}}$&#39;, fontsize=25)
cax.xaxis.set_ticks_position(&#39;top&#39;)
cax.xaxis.set_label_position(&#39;top&#39;)
ax1.invert_yaxis()

答案1

得分: 1

I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.

Also, please note that I changed the way the ratio is calculated. I guess first / (first + second) is a little more meaningful?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
upper_lim_A = 0.8
lower_lim_A = 0.3
upper_lim_B = 0.7
lower_lim_B = 0.2
# increase this for more bins
N_bins = 40
hist1, *edges_all = np.histogram2d(
    df1['B'], df1['A'],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
)
hist2, *edges_vars = np.histogram2d(
    df2['B'], df2['A'],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
)
# ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
ratio = hist2/(hist1 + hist2 + 10**-6)
plt.imshow(ratio)
plt.colorbar()

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

This is an alternative to a larger bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).

import cv2
blurred_ratio = cv2.blur(ratio, ksize=(5,5))  # choose any kernel size for Gaussian blur
plt.imshow(blurred_ratio)

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

英文:

I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.

Also, please note that I changed the way the ratio is calculated. I guess first / (first + second) is a little more meaningful?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list(&#39;AB&#39;))
df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list(&#39;AB&#39;))
upper_lim_A = 0.8
lower_lim_A = 0.3
upper_lim_B = 0.7
lower_lim_B = 0.2
# increase this for more bins
N_bins = 40
hist1, *edges_all = np.histogram2d(
    df1[&#39;B&#39;], df1[&#39;A&#39;],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
)
hist2, *edges_vars = np.histogram2d(
    df2[&#39;B&#39;], df2[&#39;A&#39;],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
)
# ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
ratio = hist2/(hist1 + hist2 + 10**-6)
plt.imshow(ratio)
plt.colorbar()

This is an alternative to lager bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).

import cv2
blurred_ratio = cv2.blur(ratio, ksize=(5,5))  # choose any kernel size for gaussian blur
plt.imshow(blurred_ratio)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

问题

答案1

nquad得到的结果与tplquad在三重积分中得到的结果不同

Auto pre-fill form Django

Polars arr.to_struct() throws "pyo3_runtime.PanicException: not implemented for dtype Unknown" exception

如何从由 pyproject.toml 管理的库的 wheel 中排除 “tests” 文件夹？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论