英文:
How to create a 2D density plot color coded with the fraction of points that are in two dataframes
问题
我有两个2D数据框,定义在(x, y)平面上,df1和df2,并且它们位于x,y平面的相似区域。
我想创建一个以df2中的点数除以df1中的点数着色的图。
我尝试了一些方法,但看起来有点像素化,你有什么建议吗?
英文:
I have two 2D dataframes, defined in the (x,y) plane, df1 and df2 and they are located in a similar region of the x,y plane.
I would like to create a plot color coded with number of points in df2 divided by the number of points in df1.
I try something, but it looks pixelated, do you have any recommendation?
df1 = pd.DataFrame(np.random.random((10000,2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000,2)), columns=list('AB'))
df1
upper_lim_A = 0.8
lower_lim_A = 0.3
upper_lim_B = 0.7
lower_lim_B = 0.2
fontsize = 8
N_bins = 10
hist_threshold = 1 #Number of points in a bin to be considered populated
%matplotlib inline
#Creating the histograms
hist_all, *edges_all = np.histogram2d(df1['B'], df1['A'], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
hist_vars, *edges_vars = np.histogram2d(df2['B'], df2['A'], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
fig, ax1 = plt.subplots( figsize=(10.5,2.5*3))
ax1.tick_params(direction='out', length=6, width=2, colors='k', grid_alpha=1, labelsize=size_la)
#Finding the ratio df2/df1
ratio = hist_vars/hist_all
#Replace Nan with 0
ratio[np.isnan(ratio)] = 0
#Deep copy the ratio array
ratio_copy = np.copy(ratio)
#Set the ratio to nan if hist_all < hist_threshold
ratio[hist_all < hist_threshold*0.9] = np.nan
color_style = 'inferno'
vmax = 1.1
im = ax1.imshow(ratio, alpha=1,origin='lower',cmap=color_style,vmin=0, aspect='auto', extent=[lower_lim_A,upper_lim_A,lower_lim_B, upper_lim_B])
from mpl_toolkits.axes_grid1 import make_axes_locatable
ax0 = fig.add_subplot(111)
ax0.set_frame_on(False)
ax0.xaxis.set_ticks([])
ax0.yaxis.set_ticks([])
# create a divider for the colorbar
divider = make_axes_locatable(ax0)
cax = divider.append_axes("top", size="5%", pad=0.1)
# add the colorbar to the new axis
cbar = fig.colorbar(im, cax=cax, orientation='horizontal')
cbar.ax.tick_params(labelsize=size_la)
cbar.set_label(r'$\mathrm{N}_{\mathrm{df2}}/\mathrm{N}_{\mathrm{df1}}$', fontsize=25)
cax.xaxis.set_ticks_position('top')
cax.xaxis.set_label_position('top')
ax1.invert_yaxis()
答案1
得分: 1
I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.
Also, please note that I changed the way the ratio is calculated. I guess first / (first + second)
is a little more meaningful?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
upper_lim_A = 0.8
lower_lim_A = 0.3
upper_lim_B = 0.7
lower_lim_B = 0.2
# increase this for more bins
N_bins = 40
hist1, *edges_all = np.histogram2d(
df1['B'], df1['A'],
bins=N_bins,
range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
)
hist2, *edges_vars = np.histogram2d(
df2['B'], df2['A'],
bins=N_bins,
range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
)
# ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
ratio = hist2/(hist1 + hist2 + 10**-6)
plt.imshow(ratio)
plt.colorbar()
This is an alternative to a larger bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).
import cv2
blurred_ratio = cv2.blur(ratio, ksize=(5,5)) # choose any kernel size for Gaussian blur
plt.imshow(blurred_ratio)
英文:
I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.
Also, please note that I changed the way the ratio is calculated. I guess first / (first + second)
is a little more meaningful?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
upper_lim_A = 0.8
lower_lim_A = 0.3
upper_lim_B = 0.7
lower_lim_B = 0.2
# increase this for more bins
N_bins = 40
hist1, *edges_all = np.histogram2d(
df1['B'], df1['A'],
bins=N_bins,
range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
)
hist2, *edges_vars = np.histogram2d(
df2['B'], df2['A'],
bins=N_bins,
range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
)
# ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
ratio = hist2/(hist1 + hist2 + 10**-6)
plt.imshow(ratio)
plt.colorbar()
This is an alternative to lager bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).
import cv2
blurred_ratio = cv2.blur(ratio, ksize=(5,5)) # choose any kernel size for gaussian blur
plt.imshow(blurred_ratio)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论