如何在应用签名之前检查PDF是否已被修改 – pdfbox

huangapple go评论77阅读模式
英文:

How to check if PDF has been modified before applying signature - pdfbox

问题

我有一个简单的网页应用,允许用户下载包含某些动态信息的 PDF 文件。

然后用户应该对文件进行签名,并在之后使用我的应用重新上传它。

现在,我需要检查用户是否在签名之前更改了 PDF 内容。

有没有办法可以检查这一点?我尝试过检查 byteRange,但似乎已签名的 PDF 内容完全不同:

原文件大小:2280
已签名文件大小:31485
字节范围:[0, 11433, 29635, 1850]

提前感谢。

英文:

I have a simple web application that allows the user to download a pdf containing some dynamic information.

Then the user should sign the document and re-upload it using my application.

Now, I need to check wheter the user has changed the PDF content before signing it.

Is there a way to check this? I've tried checking the byteRange, but it seems that the content of the signed pdf is totally different:

Original file size: 2280
Signed file size: 31485
Byte range: [0, 11433, 29635, 1850]

Thanks in advance.

答案1

得分: 2

I assume you sign the PDF with an integrated, embedded signature, not a detached signature file. You don't explicitly say so and locus2k appears to assume otherwise, but for a detached signature your question IMO would not make sense.

> Now, I need to check whether the user has changed the PDF content before signing it.

This is very difficult because PDF signing services apply a number of different changes to the original PDF before signing, especially if it doesn't have a prior signature. For instance, they might:

  • Linearize the file (which involves arranging objects in the PDF file in a specific order).
  • Correct minor errors.
  • Optimize certain structures.
  • Create appearances for form fields without them.
  • ...

As a result, all differences you identify need to be examined, as they might be part of the signing process rather than stemming from user manipulation.

You can certainly check specific aspects, such as:

  • Extracting text from the original and signed files for comparison.
  • Rendering the original and signed files for comparison (allowing differences only in a predefined signing area).
  • ...

However, there might be seemingly minor changes that you miss this way, but which could significantly alter the document's appearance.

There are some methods to simplify this task. For instance, you could initially sign your original PDF with an author signature wherein you specify permissible changes to the document. This should make it more challenging for the user to use unmodified standard software to make unauthorized alterations prior to signing. Moreover, this limits changes made by signing software to incremental updates, preventing complete PDF overhauls.

In your code, you would then check for the presence and validity of your author signature. If there are no issues with this, you would need to examine the incremental updates.

However, be cautious. Even checking whether these incremental updates contain undesired changes is complex. The PDF Insecurity website describes various attacks that could deceive validation routines for allowed/disallowed changes in widely used PDF validators, including Adobe Acrobat, before their disclosure.

Thus, your task is undoubtedly challenging, even when focusing on incremental update analysis.

英文:

I assume you sign the PDF with an integrated, embedded signature, not a detached signature file. You don't explicitly say so and locus2k appears to assume otherwise, but for a detached signature your question IMO would not make sense.

> Now, I need to check wheter the user has changed the PDF content before signing it.

This is very difficult because PDF signing services apply a number of different changes to the original PDF before signing, in particular if it doesn't have a prior signature. E.g. they may

  • linearize the file (which implies sorting objects in the PDF file in a specific order),
  • fix minor errors in it,
  • optimize some structures,
  • create appearances for form fields without,
  • ...

Thus, all the differences you determine must be checked, they may be part of the signing process and not part of a prior manipulation by the user.

Of course you can check specific aspects, e.g.

  • extract text from the original file and the signed copy and compare,
  • render the original file and the signed copy and compare (allowing for differences only in a predefined signing area),
  • ...

but there may be seemingly minor changes you overlook this way but which can considerably change the appearance of the document.

There are some means to make the job easier, e.g. you can sign your original PDF first with an author signature in which you declare which changes you allow to the document. This should make it at least difficult for the user to use unmanipulated standard software to do disallowed changes before signing. Furthermore, this restricts changes by the signing software to incremental updates, preventing complete PDF overhauls.

In your code you then would check for the presence and validity of this author signature by you. If there is no issue in those, you "merely" have to inspect the incremental updates.

Beware, though, even checking whether these incremental updates contain unwanted changes is difficult. On the PDF Insecurity web site a number of attacks are described which until their publication could make a fool out of the validation routines for allowed/disallowed changes of widely used PDF validators, Adobe Acrobat among them.

Thus, your task is definitively non-trivial, even if reduced to incremental update analysis.

答案2

得分: 1

有一个主要的机制可以实现这一点(MKL在某种程度上提到了这一点):

认证签名

有两种不同类型的签名:

(1)认证签名(也称为“文档签名”或作者“签名”)

(2)批准签名(也称为“用户签名”)

基本上,作为文档作者,您使用认证签名对文档进行签名。这种签名与其他签名有些不同。(例如,它具有坐标0 0 0 0,并且必须是文档中的第一个签名...)应用认证签名具有以下优点:

  • 作者可以指定用户被允许做什么和不允许做什么
  • 用户在接收文档时还可以验证文档是否已被更改(有意或无意)
  • 所有更改必须在增量更新模式下进行,否则用户将会破坏认证签名。

因此,如果您应用了认证签名,并且用户将他的更改作为增量更新添加,您可以检查增量更新以查看有哪些内容被更改。正如MKL所指出的,这并不总是很简单,我认为这取决于您的用例:

是要知道用户是否

(a) 更改了您的动态内容(填充表单字段,添加了一些评论,...),以提取这些更改并进一步处理?或者您想知道用户是否

(b) 修改了PDF,更改了一些文本,添加了图像或类似的操作,因此您想检测到欺诈性更改?

这两种情况都是可能的,但复杂程度不同。提取更改的表单数据或注释内容很容易。其他更改稍微难以提取和检测。但这也可能取决于您使用的工具。某些工具可能会对此提供更多支持...

英文:

There is one main mechanism to be able to do this (MKL partly mentions this):

CERTIFICATION SIGNATURES

There are two different kinds of signatures:

(1) certification signature (also called 'document signature' or author 'signature')

(2) approval signature (also called 'user signature')

Basically you as the document author sign the document with a certification signature. This signature is a bit different than the other ones. (E.g. it has the coordinates 0 0 0 0 and has to be the first signature in the document...) Applying the certification signature has the following advantages:

  • the author can specify what the user is allowed to do and what not
  • The user can also verify that the document has not been changed (intentional or unintentional) when receiving it
  • All changes must be done in the incremental update mode otherwise the user will break the certification signature.

So if you appy a certification signature and the user adds his changes as an incremental update you can then check the incremental update to see what was changed. As indicated by MKL this is however not always trivial and in my opinion depends on your use case:
Is it that you want to know whether a user

(a) changed your dynamic content (filled formfields, added some comment, ...) to extract those changes and process is for further use? Or do you want to know whether a user

(b) did manipulate the PDF, changed some text, added images or the like, so you want to detect fraudulent changes?

Both is possible but of varying complexity. It is easy to extract the changed form data or annotation content. Other changes are a bit more tricky to extract and detect. But this might also depend on the tool you use. Some might offer more support for this than others...

答案3

得分: 0

以下是您要翻译的内容:

最简单的方法是在文件发送给用户之前存储文件的哈希值,然后在用户提交文件时再次对文件进行哈希运算。如果哈希值不同,则文件已被修改。

我建议使用Apaches的common-codec库来执行类似以下方式的操作:

public String getSha1Hash(Path file) throws IOException {
   try(InputStream is = Files.newInputStream(file)) {
      return DigestUtils.sha1Hex(is);
   }
}

然后,在将PDF发送给用户的函数中,您可以执行类似以下操作:

Path pdf = pdfPath; // PDF文件的路径
String outgoingHash = getSha1Hash(pdf);
store(pdf, outgoingHash); // 在数据库或其他方式中存储PDF文件名和其哈希值

当用户提交PDF文件时,您可以执行以下操作:

Path pdf = pdfPath; // 输入文件的路径,这可能是一个流,所以需要进行相应调整
String incomingHash = getSha1Hash(pdf);
String originalHash = getFileHash(pdf);

if (incomingHash.equals(originalHash)) {
   // 处理相同的哈希值(文件未被修改)
} else {
   // 处理已更改的文件
}

注意:由于代码可能需要根据上下文进行适当的调整,因此在实际应用中可能需要进行一些修改。

英文:

The easiest way is store the hash value of the file before it is sent to the user then hash the file again when the user submits it. If the hashes are different then the file was modified.

I'd recommend using Apaches common-codec to do this something like this:

public String getSha1Hash(Path file) throws IOException {
   try(InputStream is = Files.newInputStream(file)) {
      return DigestUtils.sha1Hex(is);
   }
}

Then in your function that you send the pdf to you can do something like:

Path pdf = pdfPath; //the path to the pdf file
String outgoingHash = getSha1Hash(pdf);
store(pdf, outgoingHash) //store the pdf filename and its hash in a db or some other way

When the user submits the pdf you would then do:

Path pdf = pdfPath; //path to the incoming file. This might be a stream so adjust for that
String incomingHash = getSha1Hash(pdf);
String originalHash = getFileHash(pdf);

if (incomingHash.equals(originalHash)) {
   //handle same hash value (file wasnt modifieD)
} else {
   //handle changed file 
}

huangapple
  • 本文由 发表于 2020年8月26日 20:10:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/63597364.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定