pdfbox, PDFRenderer.renderImage().getWidth() and PDImageXObject.getImage().getWidth() return numbers in different scale?

huangapple go评论94阅读模式
英文:

pdfbox, PDFRenderer.renderImage().getWidth() and PDImageXObject.getImage().getWidth() return numbers in different scale?

问题

我使用pdfbox将PDF转换为图像,并且发现PDFRenderer和PDImageXObject返回的宽度似乎具有不同的比例尺。
如何以相同的比例尺获取宽度?

这是我获取页面宽度的方式:

PDFRenderer pdRender = new PDFRenderer(pdDoc);
BufferedImage singlePage = pdRender.renderImage(pgIdx-1);
singlePage.getWidth();  // 页面宽度 = 623

而这是我获取图像块宽度的方式:

PDImageXObject image = (PDImageXObject) o;
image.getImage();  // 图像宽度 = 484

"页面宽度" 是图像元数据中显示的实际尺寸,但是"图像宽度" 大于实际尺寸。实际比例在以下图像中显示(整个页面 vs 红色框)。
pdfbox, PDFRenderer.renderImage().getWidth() and PDImageXObject.getImage().getWidth() return numbers in different scale?

英文:

I use pdfbox to convert pdf into images and find the width returned by PDFRenderer and PDImageXObject seem to have different scales.
How do I get the widths in same scale?

This is how I get width of the page:

PDFRenderer pdRender = new PDFRenderer(pdDoc);
BufferedImage singlePage = pdRender.renderImage(pgIdx-1);
singlePage.getWidth();  // pageWidth = 623

and this is how I get width of the image block:

PDImageXObject image = (PDImageXObject) o;
image.getImage();  // imageWidth = 484

The "pageWidth" is the actual size as show in image metadata, but the "imageWidth" is larger than the real size. The actual ratio is shown in the following image (the whole page vs red box).
pdfbox, PDFRenderer.renderImage().getWidth() and PDImageXObject.getImage().getWidth() return numbers in different scale?

答案1

得分: 2

你确定页面大小的方法

PDFRenderer pdRender = new PDFRenderer(pdDoc);
BufferedImage singlePage = pdRender.renderImage(pgIdx-1);
singlePage.getWidth();  // 页面宽度 = 623

在将页面渲染为位图后,使用一些默认设置,特别是在某个未知分辨率下,以像素为单位确定页面宽度。

你确定图像尺寸的方法

PDImageXObject image = (PDImageXObject) o;
image.getImage();  // 图像宽度 = 484

在不考虑图像在页面上的使用方式的情况下,确定位图资源的实际尺寸。

因此,这些数字是完全无关的。


如果您想要比较PDF页面上的尺寸,自然的选择是PDF页面的默认用户空间单位。默认情况下,它们等于1/72英寸。

您可以像这样获取PDPage page的用户空间单位的页面尺寸:

PDRectangle cropBox = page.getCropBox();
float width = cropBox.getWidth();
float height = cropBox.getHeight();

PDF页面上位图的尺寸稍微复杂,因为位图受任意仿射变换影响,在绘制时受到当前变换矩阵(CTM)的影响。因此,您必须确定该CTM值。为此,您必须解析页面内容,直到绘制位图的点,然后您必须从当前变换矩阵中读取CTM。

PDFBox示例PrintImageLocations演示了这一点,输出"displayed size = XXX, YYY in user space units"是您要寻找的内容。

英文:

Your way to determine the page size

<!-- language-all: lang-java -->

PDFRenderer pdRender = new PDFRenderer(pdDoc);
BufferedImage singlePage = pdRender.renderImage(pgIdx-1);
singlePage.getWidth();  // pageWidth = 623

is determining the page width in pixel after rendering the page as bitmap using some default settings, in particular at some unknown resolution.

Your way to determine the image dimension

PDImageXObject image = (PDImageXObject) o;
image.getImage();  // imageWidth = 484

is determining the actual dimensions of the bitmap resource without consideration of how it is used on the page if at all.

Thus, those numbers are entirely unrelated.


If you want to compare sizes on a PDF page, the natural choice of units would be the default user space units of a PDF page. By default they equal <sup>1</sup>/<sub>72</sub> inch.

You can retrieve the page size of a PDPage page in user space units like this:

PDRectangle cropBox = page.getCropBox();
float width = cropBox.getWidth();
float height = cropBox.getHeight();

The dimensions of a bitmap on a PDF page are a bit more difficult because a bitmap is subject to an arbitrary affine transformation, the current transformation matrix (CTM) at the time it is drawn. Thus, you have to determine that CTM value. To do so you have to parse the page content up to the point at which the bitmap is drawn, and right then you have to read the CTM from the current transformation matrix.

The PDFBox example PrintImageLocations demonstrates this, the output "displayed size = XXX, YYY in user space units" is the one you're looking for.

huangapple
  • 本文由 发表于 2020年8月24日 17:45:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/63558515.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定