创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

huangapple go评论92阅读模式
英文:

How to create a 2D density plot color coded with the fraction of points that are in two dataframes

问题

  1. 我有两个2D数据框定义在(x, y)平面上df1df2并且它们位于xy平面的相似区域
  2. 我想创建一个以df2中的点数除以df1中的点数着色的图
  3. 我尝试了一些方法但看起来有点像素化你有什么建议吗
英文:

I have two 2D dataframes, defined in the (x,y) plane, df1 and df2 and they are located in a similar region of the x,y plane.

I would like to create a plot color coded with number of points in df2 divided by the number of points in df1.

I try something, but it looks pixelated, do you have any recommendation?

  1. df1 = pd.DataFrame(np.random.random((10000,2)), columns=list('AB'))
  2. df2 = pd.DataFrame(np.random.random((10000,2)), columns=list('AB'))
  3. df1
  4. upper_lim_A = 0.8
  5. lower_lim_A = 0.3
  6. upper_lim_B = 0.7
  7. lower_lim_B = 0.2
  8. fontsize = 8
  9. N_bins = 10
  10. hist_threshold = 1 #Number of points in a bin to be considered populated
  11. %matplotlib inline
  12. #Creating the histograms
  13. hist_all, *edges_all = np.histogram2d(df1['B'], df1['A'], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
  14. hist_vars, *edges_vars = np.histogram2d(df2['B'], df2['A'], bins=N_bins, range = ((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A)))
  15. fig, ax1 = plt.subplots( figsize=(10.5,2.5*3))
  16. ax1.tick_params(direction='out', length=6, width=2, colors='k', grid_alpha=1, labelsize=size_la)
  17. #Finding the ratio df2/df1
  18. ratio = hist_vars/hist_all
  19. #Replace Nan with 0
  20. ratio[np.isnan(ratio)] = 0
  21. #Deep copy the ratio array
  22. ratio_copy = np.copy(ratio)
  23. #Set the ratio to nan if hist_all < hist_threshold
  24. ratio[hist_all < hist_threshold*0.9] = np.nan
  25. color_style = 'inferno'
  26. vmax = 1.1
  27. im = ax1.imshow(ratio, alpha=1,origin='lower',cmap=color_style,vmin=0, aspect='auto', extent=[lower_lim_A,upper_lim_A,lower_lim_B, upper_lim_B])
  28. from mpl_toolkits.axes_grid1 import make_axes_locatable
  29. ax0 = fig.add_subplot(111)
  30. ax0.set_frame_on(False)
  31. ax0.xaxis.set_ticks([])
  32. ax0.yaxis.set_ticks([])
  33. # create a divider for the colorbar
  34. divider = make_axes_locatable(ax0)
  35. cax = divider.append_axes("top", size="5%", pad=0.1)
  36. # add the colorbar to the new axis
  37. cbar = fig.colorbar(im, cax=cax, orientation='horizontal')
  38. cbar.ax.tick_params(labelsize=size_la)
  39. cbar.set_label(r'$\mathrm{N}_{\mathrm{df2}}/\mathrm{N}_{\mathrm{df1}}$', fontsize=25)
  40. cax.xaxis.set_ticks_position('top')
  41. cax.xaxis.set_label_position('top')
  42. ax1.invert_yaxis()

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

答案1

得分: 1

I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.

Also, please note that I changed the way the ratio is calculated. I guess first / (first + second) is a little more meaningful?

  1. import pandas as pd
  2. import numpy as np
  3. import matplotlib.pyplot as plt
  4. df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
  5. df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
  6. upper_lim_A = 0.8
  7. lower_lim_A = 0.3
  8. upper_lim_B = 0.7
  9. lower_lim_B = 0.2
  10. # increase this for more bins
  11. N_bins = 40
  12. hist1, *edges_all = np.histogram2d(
  13. df1['B'], df1['A'],
  14. bins=N_bins,
  15. range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
  16. )
  17. hist2, *edges_vars = np.histogram2d(
  18. df2['B'], df2['A'],
  19. bins=N_bins,
  20. range=((lower_lim_B, upper_lim_B), (lower_lim_A, upper_lim_A))
  21. )
  22. # ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
  23. ratio = hist2/(hist1 + hist2 + 10**-6)
  24. plt.imshow(ratio)
  25. plt.colorbar()

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

This is an alternative to a larger bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).

  1. import cv2
  2. blurred_ratio = cv2.blur(ratio, ksize=(5,5)) # choose any kernel size for Gaussian blur
  3. plt.imshow(blurred_ratio)

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

英文:

I removed much code that was only related for showing the image. There is already a parameter for the number of bins, that you can use to make the result smoother.

Also, please note that I changed the way the ratio is calculated. I guess first / (first + second) is a little more meaningful?

  1. import pandas as pd
  2. import numpy as np
  3. import matplotlib.pyplot as plt
  4. df1 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
  5. df2 = pd.DataFrame(np.random.random((10000, 2)), columns=list('AB'))
  6. upper_lim_A = 0.8
  7. lower_lim_A = 0.3
  8. upper_lim_B = 0.7
  9. lower_lim_B = 0.2
  10. # increase this for more bins
  11. N_bins = 40
  12. hist1, *edges_all = np.histogram2d(
  13. df1['B'], df1['A'],
  14. bins=N_bins,
  15. range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
  16. )
  17. hist2, *edges_vars = np.histogram2d(
  18. df2['B'], df2['A'],
  19. bins=N_bins,
  20. range=((lower_lim_B, upper_lim_B), (lower_lim_A,upper_lim_A))
  21. )
  22. # ratio is now share of total (not share of hist 1) + small epsilon to avoid div by 0
  23. ratio = hist2/(hist1 + hist2 + 10**-6)
  24. plt.imshow(ratio)
  25. plt.colorbar()

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

This is an alternative to lager bin number. This uses open cv, since it is very convenient, but you can use any image processing library that allows you to blur images (or even scipy).

  1. import cv2
  2. blurred_ratio = cv2.blur(ratio, ksize=(5,5)) # choose any kernel size for gaussian blur
  3. plt.imshow(blurred_ratio)

创建一个以两个数据框中的点所占比例为颜色编码的二维密度图。

huangapple
  • 本文由 发表于 2023年5月22日 20:53:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306405.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定