问题

我有点新手，以前从未涉足过基于云的解决方案。

我的程序使用PDFBox库从PDF中提取数据并根据数据重新命名文件。目前全部在本地进行，但最终需要部署为Azure函数。PDF将存储在Azure Blob容器中 - Azure函数的Azure Blob存储触发器是选择这种方式的一个重要原因。

当然，我可以将Blob下载到本地并进行读取，但程序应该完全在云中运行。我尝试过使用Java直接读取Blob，但结果是无意义的数据，而且与PDFBox不兼容。我目前的计划是将文件暂时存储在云中的其他位置（例如OneDrive、Azure文件存储），然后尝试从那里打开它们。然而，这似乎很快就会变成一个过于混乱的解决方案。我的问题：

（1）是否有任何方法可以将Blob打开为文件，而不是CloudBlockBlob，从而不需要这一额外步骤？

（2）如果没有，对于这种情况，推荐的临时存储是什么？

（3）是否有其他方法来解决这个问题？

英文:

I'm somewhat of a beginner and have never dealt with cloud-based solutions yet before.

My program uses the PDFBox library to extract data from PDFs and rename the file based on the data. It's all local currently, but eventually will need to be deployed as an Azure Function. The PDFs will be stored in an Azure Blob Container - the Azure Blob Storage trigger for Azure Functions is an important reason for this choice.

Of course I can download the blob locally and read it, but the program should run solely in the Cloud. I've tried reading the blobs directly using Java, but this resulted in gibberish data and wasn't compatible with PDFbox. My plan for now is to temp store the files elsewhere in the Cloud (e.g. OneDrive, Azure File Storage) and try opening them from there. However, this seems like it can quickly turn into an overly messy solution. My questions:

(1) Is there any way a blob can be opened as a File, rather than a CloudBlockBlob so this additional step isn't needed?

(2) If no, what would be a recommended temporary storage be in this case?

(3) Are there any alternative ways to approach this issue?

答案1

得分: 2

因为您计划使用 Azure 函数，您可以使用 blob 触发器/绑定来直接获取字节。然后，您可以使用 PDFBox 的 PdfDocument load 方法来直接构建对象 PDDocument.load(content)。您不需要任何临时存储来存储要加载的文件。

@FunctionName("blobprocessor")
public void run(
  @BlobTrigger(name = "file",
               dataType = "binary",
               path = "myblob/{name}",
               connection = "MyStorageAccountAppSetting") byte[] content,
  @BindingName("name") String filename,
  final ExecutionContext context
) {
  context.getLogger().info("Name: " + filename + " Size: " + content.length + " bytes");
  PDDocument doc = PDDocument.load(content);
  // 进行您的操作
}

英文:

Since you are planning Azure function, you can use blob trigger/binding to get the bytes directly. Then you can use PDFBox PdfDocument load method to directly build the object PDDocument.load(content). You won't need any temporary storage to store the file to load that.

@FunctionName(&quot;blobprocessor&quot;)
public void run(
  @BlobTrigger(name = &quot;file&quot;,
               dataType = &quot;binary&quot;,
               path = &quot;myblob/{name}&quot;,
               connection = &quot;MyStorageAccountAppSetting&quot;) byte[] content,
  @BindingName(&quot;name&quot;) String filename,
  final ExecutionContext context
) {
  context.getLogger().info(&quot;Name: &quot; + filename + &quot; Size: &quot; + content.length + &quot; bytes&quot;);
  PDDocument doc = PDDocument.load(content);
  // do your stuffs
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在云中读取Azure Blob（PDF）的最实际方法是什么？

问题

答案1

多个查询值绑定在POJO中

将文件添加到使用Kotlin DSL的Gradle sourceSets

提取异常信息中的 statusCode 和 message。

解析JWT声明时出现问题

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论