创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

huangapple go评论59阅读模式
英文:

How to create a 2D density plot color coded with the fraction of points that are in two dataframes

问题

我有两个2D数据框定义在(x, y)平面上df1和df2并且它们位于xy平面的相似区域

我想创建一个以df2中的点数除以df1中的点数着色的图

我尝试了一些方法但看起来有点像素化你有什么建议吗

英文:

I have two 2D dataframes, defined in the (x,y) plane, df1 and df2 and they are located in a similar region of the x,y plane.

I would like to create a plot color coded with number of points in df2 divided by the number of points in df1.

I try something, but it looks pixelated, do you have any recommendation?

df1 = pd.DataFrame(np.random.random((10000,2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000,2)), columns=list('AB'))
df1


upper_lim_A = 0.8
lower_lim_A = 0.3

upper_lim_B = 0.7
lower_lim_B = 0.2

fontsize = 8
N_bins = 10
hist_threshold = 1 #Number of points in a bin to be considered populated


%matplotlib inline
#Creating the histograms
hist_all, *edges_all = np.histogram2d(df1['B'], df1['A'], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
hist_vars, *edges_vars = np.histogram2d(df2['B'], df2['A'], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))

fig, ax1 = plt.subplots( figsize=(10.5,2.5*3))
ax1.tick_params(direction='out', length=6, width=2, colors='k', grid_alpha=1, labelsize=size_la)

#Finding the ratio df2/df1
ratio = hist_vars/hist_all
#Replace Nan with 0
ratio[np.isnan(ratio)] = 0

#Deep copy the ratio array
ratio_copy = np.copy(ratio)
#Set the ratio to nan if hist_all < hist_threshold
ratio[hist_all < hist_threshold*0.9] = np.nan

color_style = 'inferno'
vmax = 1.1

im = ax1.imshow(ratio, alpha=1,origin='lower',cmap=color_style,vmin=0, aspect='auto', extent=[lower_lim_A,upper_lim_A,lower_lim_B, upper_lim_B])

from mpl_toolkits.axes_grid1 import make_axes_locatable

ax0 = fig.add_subplot(111)
ax0.set_frame_on(False)
ax0.xaxis.set_ticks([])
ax0.yaxis.set_ticks([])

# create a divider for the colorbar
divider = make_axes_locatable(ax0)
cax = divider.append_axes("top", size="5%", pad=0.1)

# add the colorbar to the new axis
cbar = fig.colorbar(im, cax=cax, orientation='horizontal')
cbar.ax.tick_params(labelsize=size_la)
cbar.set_label(r'$\mathrm{N}_{\mathrm{df2}}/\mathrm{N}_{\mathrm{df1}}$', fontsize=25)
cax.xaxis.set_ticks_position('top')
cax.xaxis.set_label_position('top')
ax1.invert_yaxis()



创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

答案1

得分: 1

I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.

Also, please note that I changed the way the ratio is calculated. I guess first / (first + second) is a little more meaningful?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))

upper_lim_A = 0.8
lower_lim_A = 0.3

upper_lim_B = 0.7
lower_lim_B = 0.2

# increase this for more bins
N_bins = 40

hist1, *edges_all = np.histogram2d(
    df1['B'], df1['A'],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
)

hist2, *edges_vars = np.histogram2d(
    df2['B'], df2['A'],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
)

# ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
ratio = hist2/(hist1 + hist2 + 10**-6)

plt.imshow(ratio)
plt.colorbar()

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

This is an alternative to a larger bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).

import cv2
blurred_ratio = cv2.blur(ratio, ksize=(5,5))  # choose any kernel size for Gaussian blur
plt.imshow(blurred_ratio)

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

英文:

I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.

Also, please note that I changed the way the ratio is calculated. I guess first / (first + second) is a little more meaningful?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))


upper_lim_A = 0.8
lower_lim_A = 0.3

upper_lim_B = 0.7
lower_lim_B = 0.2


# increase this for more bins
N_bins = 40

hist1, *edges_all = np.histogram2d(
    df1['B'], df1['A'],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
)

hist2, *edges_vars = np.histogram2d(
    df2['B'], df2['A'],
    bins=N_bins,
    range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
)

# ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
ratio = hist2/(hist1 + hist2 + 10**-6)


plt.imshow(ratio)
plt.colorbar()

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

This is an alternative to lager bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).

import cv2
blurred_ratio = cv2.blur(ratio, ksize=(5,5))  # choose any kernel size for gaussian blur
plt.imshow(blurred_ratio)

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

huangapple
  • 本文由 发表于 2023年5月22日 20:53:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306405.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定