2020年9月28日 05:45:01go评论177阅读模式

英文:

Extract Checkbox value out of PDF 1.7 using PDFBox

问题

我最近开始使用pdfbox提取pdf中的文本。尽管我需要提取文本，但我还需要提取图像中显示的复选框值。我尝试了不同的方法来查找复选框元素并提取其值。

在通过此工具研究pdf文本后，我发现复选框不是图像或任何东西，而是由以下内容表示的某种图形。

ET
Q
q
BT
/F2 6 Tf
481.3 653.29 Td
(&#160;&#160;) Tj
ET
Q
q
1 1 1 rg
484.3 653.29 9 9 re
f
Q
q
0.87059 0.87059 0.87059 rg
485.05 661.54 m
492.55 661.54 l
493.3 662.29 l
484.3 662.29 l
485.05 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
492.55 661.54 m
492.55 654.04 l
493.3 653.29 l
493.3 662.29 l
492.55 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
492.55 654.04 m
485.05 654.04 l
484.3 653.29 l
493.3 653.29 l
492.55 654.04 l
f
Q
q
0.87059 0.87059 0.87059 rg
485.05 654.04 m
485.05 661.54 l
484.3 662.29 l
484.3 653.29 l
485.05 654.04 l
f
Q
q
BT
/F2 6 Tf
495.55 653.29 Td
(Yes) Tj
ET
Q
q
BT
/F2 6 Tf
504.88 653.29 Td
(&#160;&#160;) Tj
ET
Q
q
1 1 1 rg
507.88 653.29 9 9 re
f
Q
q
0.87059 0.87059 0.87059 rg
508.63 661.54 m
516.13 661.54 l
516.88 662.29 l
507.88 662.29 l
508.63 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
516.13 661.54 m
516.13 654.04 l
516.88 653.29 l
516.88 662.29 l
516.13 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
516.13 654.04 m
508.63 654.04 l
507.88 653.29 l
516.88 653.29 l
516.13 654.04 l
f
Q
q
0.87059 0.87059 0.87059 rg
508.63 654.04 m
508.63 661.54 l
507.88 662.29 l
507.88 653.29 l
508.63 654.04 l
f
Q
q
BT
/F2 6 Tf
519.13 653.29 Td
(No) Tj
ET
Q
q
BT
/F2 6 Tf
36.75 642.95 Td

我不确定如何从pdf中提取这些内容，我看到pdfbox提供了不同的解析器，但似乎我需要更多关于pdf构造的信息。任何指针都将不胜感激。

英文:

I have recently started working with pdfbox to extract text out of pdf. Though along with text I also need to extract checkbox value show in image. I have tried different methods to find the checkbox element and extract its values.

After researching the pdf text through this tool I found that the checkbox is not image or anything but some kind of graphics represented by below content.

ET
Q
q
BT
/F2 6 Tf
481.3 653.29 Td
(&#160;&#160;) Tj
ET
Q
q
1 1 1 rg
484.3 653.29 9 9 re
f
Q
q
0.87059 0.87059 0.87059 rg
485.05 661.54 m
492.55 661.54 l
493.3 662.29 l
484.3 662.29 l
485.05 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
492.55 661.54 m
492.55 654.04 l
493.3 653.29 l
493.3 662.29 l
492.55 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
492.55 654.04 m
485.05 654.04 l
484.3 653.29 l
493.3 653.29 l
492.55 654.04 l
f
Q
q
0.87059 0.87059 0.87059 rg
485.05 654.04 m
485.05 661.54 l
484.3 662.29 l
484.3 653.29 l
485.05 654.04 l
f
Q
q
BT
/F2 6 Tf
495.55 653.29 Td
(Yes) Tj
ET
Q
q
BT
/F2 6 Tf
504.88 653.29 Td
(&#160;&#160;) Tj
ET
Q
q
1 1 1 rg
507.88 653.29 9 9 re
f
Q
q
0.87059 0.87059 0.87059 rg
508.63 661.54 m
516.13 661.54 l
516.88 662.29 l
507.88 662.29 l
508.63 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
516.13 661.54 m
516.13 654.04 l
516.88 653.29 l
516.88 662.29 l
516.13 661.54 l
f
Q
q
0.87059 0.87059 0.87059 rg
516.13 654.04 m
508.63 654.04 l
507.88 653.29 l
516.88 653.29 l
516.13 654.04 l
f
Q
q
0.87059 0.87059 0.87059 rg
508.63 654.04 m
508.63 661.54 l
507.88 662.29 l
507.88 653.29 l
508.63 654.04 l
f
Q
q
BT
/F2 6 Tf
519.13 653.29 Td
(No) Tj
ET
Q
q
BT
/F2 6 Tf
36.75 642.95 Td

I am not sure how to extract this out of pdf, I have seen different parser provided by pdfbox but it looks like I need to have more information about how pdf is constructed. Any pointers would be much more appreciated.

答案1

得分: 3

以下是您提供的内容的中文翻译：

在您的评论中，您确认了在输入文档中“所有复选框和选中标记都是相同绘制”的。

因此，要从文档中提取复选框及其选中状态，您可以精确地搜索页面内容，查找绘制框和标记的指令序列，就像在示例文档中一样。

复选框和选中标记的绘制方式

正如您已经发现的那样，复选框是通过分别为每个边缘（顶部、右侧、底部、左侧）填充一个路径来绘制的，例如问题1的“是”框如下所示：

485.05 661.54 m
492.55 661.54 l
493.3 662.29 l
484.3 662.29 l
485.05 661.54 l
f
...
492.55 661.54 m
492.55 654.04 l
493.3 653.29 l
493.3 662.29 l
492.55 661.54 l
f
...
492.55 654.04 m
485.05 654.04 l
484.3 653.29 l
493.3 653.29 l
492.55 654.04 l
f
...
485.05 654.04 m
485.05 661.54 l
484.3 662.29 l
484.3 653.29 l
485.05 654.04 l
f

检查文档中的所有框，您会发现它们的绘制指令遵循这种模式：

A B m
(A+7.5) B l
(A+8.25) (B+0.75) l
(A-0.75) (B+0.75) l
A B l
f
...
C B m
C (B-7.5) l
(C+0.75) (B-8.25) l
(C+0.75) (B+0.75) l
C B l
f
...
C D m
(C-7.5) D l
(C-8.25) (D-0.75) l
(C+0.75) (D-0.75) l
C D l
f
...
A D m
A (D+7.5) l
(A-0.75) (D+8.25) l
(A-0.75) (D-0.75) l
A D l
f

这里的A和C是框的左和右侧的x坐标，B和D是其顶部和底部的y坐标。

类似地，选中标记是通过分别填充两个路径（左半部分和右半部分）来绘制的，例如问题1的“是”框中的标记如下所示：

0.70711 -0.70711 0.70711 0.70711 -323.79 536.88 cm
...
489.55 661.54 m
489.55 657.79 l
490.3 657.04 l
490.3 661.54 l
489.55 661.54 l
f
...
489.55 657.79 m
488.05 657.79 l
488.05 657.04 l
490.3 657.04 l
489.55 657.79 l
f

检查文档中的所有选中标记，您会发现它们的绘制指令遵循这种模式：

0.70711 -0.70711 0.70711 0.70711 X Y cm 
...
A B m
A (B-3.75) l
(A+0.75) (B-4.5) l
(A+0.75) B l
A B l
f 
...
A C m
(A-1.5) C l
(A-1.5) (C-0.75) l
(A+0.75) (C-0.75) l
A C l
f

第一行将坐标系旋转45°，从而允许使用主要是水平和垂直线绘制选中标记。

在此旋转坐标系中，A和C是较长选中标记臂的左上角坐标，A和C是连接两个标记臂的线的最上方点的坐标。

如何搜索这些指令序列

与此相关的任务在PdfBoxFinder类中已经实现，在此答案中，该类收集绘制为细长矩形形成网格的线条。

因此，在我们的情况下，我们可以使用相同的基础，即PDFBox的PDFGraphicsStreamEngine类。我们只需查看不同类型的路径（由移动和直线指令构建，而不是由矩形指令构建），当然还必须以不同的方式处理路径（而不是识别网格，我们必须识别特定的复选框和选中标记）。

可以像这样实现一个复选框查找类：

public class PdfCheckBoxFinder extends PDFGraphicsStreamEngine {
    // ...
}

您可以按以下方式使用PdfCheckBoxFinder来查找文档中的复选框及其选中状态：

PDDocument document = ...
for (PDPage page : document.getPages())
{
    PdfCheckBoxFinder finder = new PdfCheckBoxFinder(page);
    finder.processPage(page);
    for (CheckBox checkBox : finder.getBoxes()) {
        Point2D ll = checkBox.getLowerLeft();
        Point2D ur = checkBox.getUpperRight();
        String checked = checkBox.isChecked() ? "checked" : "not checked";
        System.out.printf(Locale.ROOT, "* (%4.3f, %4.3f) - (%4.3f, %4.3f) - %s\n", ll.getX(), ll.getY(), ur.getX(), ur.getY(), checked);
    }
}

如果您遇到更多的变化，您可以

英文:

In a comment you confirm that

> all check boxes and check marks are drawn identically

in your input documents.

To extract the check boxes and their check state from your document, therefore, you can search the page content exactly for instruction sequences drawing the boxes and marks therein like in the example document.

How Boxes And Check Marks Are Drawn

As you already found out, the boxes are drawn by filling one path for each edge (top, right, bottom, left) respectively like this in case of the "yes" box for question 1:

485.05 661.54 m
492.55 661.54 l
493.3 662.29 l
484.3 662.29 l
485.05 661.54 l
f
...
492.55 661.54 m
492.55 654.04 l
493.3 653.29 l
493.3 662.29 l
492.55 661.54 l
f
...
492.55 654.04 m
485.05 654.04 l
484.3 653.29 l
493.3 653.29 l
492.55 654.04 l
f
...
485.05 654.04 m
485.05 661.54 l
484.3 662.29 l
484.3 653.29 l
485.05 654.04 l
f

Inspecting all the boxes in the document you can see that their drawing instructions follow this pattern:

A B m
(A+7.5) B l
(A+8.25) (B+0.75) l
(A-0.75) (B+0.75) l
A B l
f
...
C B m
C (B-7.5) l
(C+0.75) (B-8.25) l
(C+0.75) (B+0.75) l
C B l
f
...
C D m
(C-7.5) D l
(C-8.25) (D-0.75) l
(C+0.75) (D-0.75) l
C D l
f
...
A D m
A (D+7.5) l
(A-0.75) (D+8.25) l
(A-0.75) (D-0.75) l
A D l
f

Here A and C are the left and right x coordinates of the box and B and D are the top and bottom y coordinates thereof.

Similarly the check marks are drawn by filling two paths (left and right half) respectively like this in case of the mark in the "yes" box for question 1:

0.70711 -0.70711 0.70711 0.70711 -323.79 536.88 cm
...
489.55 661.54 m
489.55 657.79 l
490.3 657.04 l
490.3 661.54 l
489.55 661.54 l
f
...
489.55 657.79 m
488.05 657.79 l
488.05 657.04 l
490.3 657.04 l
489.55 657.79 l
f

Inspecting all the check marks in the document you can see that their drawing instructions follow this pattern:

0.70711 -0.70711 0.70711 0.70711 X Y cm 
...
A B m
A (B-3.75) l
(A+0.75) (B-4.5) l
(A+0.75) B l
A B l
f 
...
A C m
(A-1.5) C l
(A-1.5) (C-0.75) l
(A+0.75) (C-0.75) l
A C l
f

The first line transforms the coordinate system by rotating it by 45° around some point; this allows to draw the check mark using mostly horizontal and vertical lines.

In this rotated coordinate system (A,B) are the coordinates of the left top corner of the longer check mark arm and (A,C) are those of upmost point of of the line where the two arms of the check mark join.

How to Search for Those Instruction Sequences

A related task has been implemented in the PdfBoxFinder class in this answer, a class that collects lines drawn as thin, long rectangles forming a grid.

Thus, we can use the same foundation, the PDFBox PDFGraphicsStreamEngine class, in our case. We merely have to look at different kinds of paths (built by move-to and line-to instructions, not be rectangle instructions) and of course process the paths differently (instead of recognizing a grid, we must recognize our specific check boxes and check marks).

Such a check box finder class can be implemented like this:

public class PdfCheckBoxFinder extends PDFGraphicsStreamEngine {
public class CheckBox {
public Point2D getLowerLeft()   {   return lowerLeft;   }
public Point2D getUpperRight()  {   return upperRight;  }
public boolean isChecked()      {   return checked;     }
CheckBox(Point2D lowerLeft, Point2D upperRight, boolean checked) {
this.lowerLeft = lowerLeft;
this.upperRight = upperRight;
this.checked = checked;
}
final Point2D lowerLeft;
final Point2D upperRight;
final boolean checked;
}
public PdfCheckBoxFinder(PDPage page) {
super(page);
for (int i = 0; i &lt; pathAnchorsByType.length; i++)
pathAnchorsByType[i] = new ArrayList&lt;Point2D&gt;();
}
public List&lt;CheckBox&gt; getBoxes() {
if (checkBoxes.isEmpty()) {
for (Point2D anchor : pathAnchorsByType[PathType.boxBottom.index]) {
if (containsApproximatly(pathAnchorsByType[PathType.boxLeft.index], anchor) &amp;&amp;
containsApproximatly(pathAnchorsByType[PathType.boxRight.index], anchor) &amp;&amp;
containsApproximatly(pathAnchorsByType[PathType.boxTop.index], anchor)) {
Point2D upperRight = new Point2D.Float(7.5f + (float)anchor.getX(), 7.5f + (float)anchor.getY());
boolean checked = containsInRectangle(pathAnchorsByType[PathType.checkLeft.index], anchor, upperRight) &amp;&amp;
containsInRectangle(pathAnchorsByType[PathType.checkRight.index], anchor, upperRight);
checkBoxes.add(new CheckBox(anchor, upperRight, checked));
}
}
}
return Collections.unmodifiableList(checkBoxes);
}
boolean containsApproximatly(List&lt;Point2D&gt; points, Point2D anchor) {
for (Point2D point : points) {
if (approximatelyEquals(point.getX(), anchor.getX()) &amp;&amp; approximatelyEquals(point.getY(), anchor.getY()))
return true;
}
return false;
}
boolean containsInRectangle(List&lt;Point2D&gt; points, Point2D lowerLeft, Point2D upperRight) {
for (Point2D point : points) {
if (lowerLeft.getX() &lt; point.getX() &amp;&amp; point.getX() &lt; upperRight.getX() &amp;&amp;
lowerLeft.getY() &lt; point.getY() &amp;&amp; point.getY() &lt; upperRight.getY())
return true;
}
return false;
}
//
// PDFGraphicsStreamEngine overrides
//
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
moveTo((float) p0.getX(), (float) p0.getY());
path.add(new Rectangle(p0, p1, p2, p3));
}
@Override
public void moveTo(float x, float y) throws IOException {
currentPoint = new Point2D.Float(x, y);
currentStartPoint = currentPoint;
}
@Override
public void lineTo(float x, float y) throws IOException {
Point2D point = new Point2D.Float(x, y);
path.add(new Line(currentPoint, point));
currentPoint = point;
}
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
Point2D point1 = new Point2D.Float(x1, y1);
Point2D point2 = new Point2D.Float(x2, y2);
Point2D point3 = new Point2D.Float(x3, y3);
path.add(new Curve(currentPoint, point1, point2, point3));
currentPoint = point3;
}
@Override
public Point2D getCurrentPoint() throws IOException {
return currentPoint;
}
@Override
public void closePath() throws IOException {
path.add(new Line(currentPoint, currentStartPoint));
currentPoint = currentStartPoint;
}
@Override
public void endPath() throws IOException {
clearPath();
}
@Override
public void strokePath() throws IOException {
clearPath();
}
@Override
public void fillPath(int windingRule) throws IOException {
processPath();
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException {
clearPath();
}
@Override public void drawImage(PDImage pdImage) throws IOException { }
@Override public void clip(int windingRule) throws IOException { }
@Override public void shadingFill(COSName shadingName) throws IOException { }
//
// internal representation of a path
//
interface PathElement {
}
class Rectangle implements PathElement {
final Point2D p0, p1, p2, p3;
Rectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) {
this.p0 = p0;
this.p1 = p1;
this.p2 = p2;
this.p3 = p3;
}
}
class Line implements PathElement {
final Point2D p0, p1;
Line(Point2D p0, Point2D p1) {
this.p0 = p0;
this.p1 = p1;
}
}
class Curve implements PathElement {
final Point2D p0, p1, p2, p3;
Curve(Point2D p0, Point2D p1, Point2D p2, Point2D p3) {
this.p0 = p0;
this.p1 = p1;
this.p2 = p2;
this.p3 = p3;
}
}
Point2D currentPoint = null;
Point2D currentStartPoint = null;
void clearPath() {
path.clear();
currentPoint = null;
currentStartPoint = null;
}
void processPath() {
for (PathType pathType : PathType.values()) {
if (pathType.matches(path)) {
pathAnchorsByType[pathType.index].add(pathType.getAnchor(path));
}
}
clearPath();
}
enum PathType {
boxTop(new float[] {7.5f, 0f, .75f, .75f, -9f, 0f, .75f, -.75f}, new float[] {0f, -7.5f}, 0),
boxRight(new float[] {0f, -7.5f, .75f, -.75f, 0f, 9f, -.75f, -.75f}, new float[] {-7.5f, -7.5f}, 1),
boxBottom(new float[] {-7.5f, 0f, -.75f, -.75f, 9f, 0f, -.75f, .75f}, new float[] {-7.5f, 0f}, 2),
boxLeft(new float[] {0f, 7.5f, -.75f, .75f, 0f, -9f, .75f, .75f}, new float[] {0f, 0f}, 3),
checkRight(new float[] {-2.65165f, -2.65165f, 0f, -1.06066f, 3.18198f, 3.18198f, -.53033f, .53033f}, new float[] {-2.65165f, -2.65165f/*-5.1072f, -4.4559f*/}, 4),
checkLeft(new float[] {-1.06066f, 1.06066f, -.53033f, -.53033f, 1.59099f, -1.59099f, 0f, 1.06066f}, new float[] {0f, 0f/*-2.4556f, -1.8042f*/}, 5)
;
PathType(float[] diffs, float[] offsetToAnchor, int index) {
this.diffs = diffs;
this.offsetToAnchor = offsetToAnchor;
this.index = index;
}
boolean matches(List&lt;PathElement&gt; path) {
if (path != null &amp;&amp; path.size() * 2 == diffs.length) {
for (int i = 0; i &lt; path.size(); i++) {
PathElement element = path.get(i);
if (!(element instanceof Line))
return false;
Line line = (Line) element;
if (!approximatelyEquals(line.p1.getX() - line.p0.getX(), diffs[i*2]))
return false;
if (!approximatelyEquals(line.p1.getY() - line.p0.getY(), diffs[i*2+1]))
return false;
}
return true;
}
return false;
}
Point2D getAnchor(List&lt;PathElement&gt; path) {
if (path != null &amp;&amp; path.size() &gt; 0) {
PathElement element = path.get(0);
if (element instanceof Line) {
Line line = (Line) element;
Point2D p = line.p0;
return new Point2D.Float((float)p.getX() + offsetToAnchor[0], (float)p.getY() + offsetToAnchor[1]);
}
}
return null;
}
final float[] diffs;
final float[] offsetToAnchor;
final int index;
}
static boolean approximatelyEquals(double f, double g) {
return Math.abs(f - g) &lt; 0.001;
}
//
// members
//
final List&lt;PathElement&gt; path = new ArrayList&lt;&gt;();
final List&lt;Point2D&gt;[] pathAnchorsByType = new List[PathType.values().length];
final List&lt;CheckBox&gt; checkBoxes = new ArrayList&lt;&gt;(); 
}

(PdfCheckBoxFinder)

You can use the PdfCheckBoxFinder like this to find the check boxes of a document and their checked states:

PDDocument document = ...
for (PDPage page : document.getPages())
{
PdfCheckBoxFinder finder = new PdfCheckBoxFinder(page);
finder.processPage(page);
for (CheckBox checkBox : finder.getBoxes()) {
Point2D ll = checkBox.getLowerLeft();
Point2D ur = checkBox.getUpperRight();
String checked = checkBox.isChecked() ? &quot;checked&quot; : &quot;not checked&quot;;
System.out.printf(Locale.ROOT, &quot;* (%4.3f, %4.3f) - (%4.3f, %4.3f) - %s\n&quot;, ll.getX(), ll.getY(), ur.getX(), ur.getY(), checked);
}
}

([ExtractCheckBoxes][3] test testExtractFromUpdatedForm)

For your example PDF one gets

* (485.050, 654.040) - (492.550, 661.540) - checked
* (508.630, 654.040) - (516.130, 661.540) - not checked
* (485.050, 641.760) - (492.550, 649.260) - checked
* (508.630, 641.760) - (516.130, 649.260) - not checked
* (485.050, 629.490) - (492.550, 636.990) - not checked
* (508.630, 629.490) - (516.130, 636.990) - checked
* (485.050, 617.220) - (492.550, 624.720) - checked
* (508.630, 617.220) - (516.130, 624.720) - not checked
* (485.050, 593.700) - (492.550, 601.200) - checked
* (508.630, 593.700) - (516.130, 601.200) - not checked
* (485.050, 581.420) - (492.550, 588.920) - checked
* (508.630, 581.420) - (516.130, 588.920) - not checked
* (485.050, 569.150) - (492.550, 576.650) - checked
* (508.630, 569.150) - (516.130, 576.650) - not checked
* (91.330, 553.500) - (98.830, 561.000) - not checked
* (125.570, 553.500) - (133.070, 561.000) - not checked
* (200.150, 553.500) - (207.650, 561.000) - not checked
* (286.220, 553.500) - (293.720, 561.000) - not checked
* (77.190, 331.430) - (84.690, 338.930) - not checked

(The coordinates are in the natural coordinate system given by the crop box of the PDF page in question. To relate to coordinates from the PDFTextStripper a transformation into the proprietary coordinate system of the text stripper may be necessary.)

Beware, though, as said at the start the code above only works for check boxes and check marks built exactly as in your example PDF. You confirmed that this would be the case but probably you will be surprised.

If you actually encounter a (very!) few variations thereof, you can add PathType entries matching all of them and enhance getBoxes accordingly to recognize all those variations.

If you happen to come across more than only a few variations, you should go for OCR.

How to Combine the Check Boxes With Text Extraction

In a comment you proposed

> is there a possibility if I can remove the graphics and replate it with some text for an example C or 'N' then I can do text extraction of the newly generated pdf

Indeed, one can simply add textual marks for check and unchecked check boxes to the page and then apply text extraction to get the text including the marks. I would propose, though, to use DingBats like ✔ and ✗. This can be done like this:

PDDocument document = ...;
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
for (PDPage page : document.getPages())
{
PdfCheckBoxFinder finder = new PdfCheckBoxFinder(page);
finder.processPage(page);
for (CheckBox checkBox : finder.getBoxes()) {
Point2D ll = checkBox.getLowerLeft();
Point2D ur = checkBox.getUpperRight();
String checkBoxString = checkBox.isChecked() ? &quot;\u2714&quot; : &quot;\u2717&quot;;
try (   PDPageContentStream canvas = new PDPageContentStream(document, page, AppendMode.APPEND, false, true)) {
canvas.beginText();
canvas.setNonStrokingColor(1, 0, 0);
canvas.setFont(font, (float)(ur.getY()-ll.getY()));
canvas.newLineAtOffset((float)ll.getX(), (float)ll.getY());
canvas.showText(checkBoxString);
canvas.endText();
}
}
}
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
String text = stripper.getText(document);

([ExtractCheckBoxes][4] test testExtractInlinedInTextFromUpdatedForm)

For your example PDF one gets

1. Have you met or discussed with principal life to be assured? &#160;&#160;✔ Yes&#160;&#160;✗ No
2. Is the principal life to be assured an existing bank customer? &#160;&#160;✔ Yes&#160;&#160;✗ No
3. Are you related to the proposed Life to be Assured? If yes, please state your relationship with applicant &#160;&#160;✗ Yes&#160;&#160;✔ No
4. Are you satisfied with the financial standing of the proposed Life to be Assured? &#160;&#160;✔ Yes&#160;&#160;✗ No
&#160;&#160; What is the estimated annual income of the Life to be Assured? 600000
...

1: https://stackoverflow.com/a/51560024/1729265 "Extracting text from pdf (java using pdfbox library) from a table's rows with different heights"
2: https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/main/java/mkl/testarea/pdfbox2/extract/PdfCheckBoxFinder.java
[3]: https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractCheckBoxes.java#L30
[4]: https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractCheckBoxes.java#L76

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用PDFBox从PDF 1.7中提取复选框的值。

问题

答案1

复选框和选中标记的绘制方式

如何搜索这些指令序列

How Boxes And Check Marks Are Drawn

How to Search for Those Instruction Sequences

How to Combine the Check Boxes With Text Extraction

Java模块（JPMS/Jigsaw）是否解决了“Shading”依赖解决的问题？

我如何解决在使用Spring JPA创建自定义查询方法时遇到的问题。

错误出现在我尝试在A类中调用B类的方法，但没有主方法。

com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type `[]` from Object value (token `JsonToken.START_OBJECT`)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论