Azure Functions Python：BlobTrigger 在流为空时启动。

2023年4月17日 03:30:11go评论168阅读模式

英文:

Azure Functions python: BlobTrigger starts when stream is empty

问题

以下是翻译好的部分：

我正在尝试实现一个流水线，每个步骤都会在 Blob 容器中写入一个文件。

步骤：

PDF 触发一个函数，提取文本并将其保存为 txt 文件在容器中。
提取的文本触发一个函数，对结果进行优化并在容器中写入第三个文件。

第二步有时（大多数情况下）会失败，因为由第一步生成的 txt 文件的流看起来是空的（实际上不是）。

我没有预算，所以无法在 Azure 中使用服务总线或事件中心。我认为触发器是通过轮询启动的。

有人有办法解决这个问题或我应该从哪里开始查找吗？

英文:

I am trying to implement a pipeline that each step writes a file in a blob container.

Steps:

pdf triggers a function that extracts the text and saves it as txt in the container
the extracted text triggers a function that polishes the result and write a third file in the container

The second step sometime (most of them) fails because the stream of the txt file generated by the first step looks empty (it's not).

I don't have any budget so i can't use service bus or event hubs in Azure. I think that the triggers starts via polling.

Anyone has an idea on how can i solve this problem or where i should start to look?

答案1

得分: 1

After reproducing from my end, this was working fine when I followed the below process. I have created 2 Blob Trigger functions where one is to convert pdf to txt file and the other is to read the txt file, do the required manipulations, and then save it to another container.

pdf triggers a function that extracts the text and saves it as txt in the container

Function1.cs - Reading the pdf file and saving it as a text file in a container

using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

namespace FunctionApp5
{
    public class Function1
    {
        [FunctionName("Function1")]
        public void Run([BlobTrigger("pdfcontainer/{name}", Connection = "constr")]Stream myBlob, string name,
            [Blob("textcontainer/1.txt", FileAccess.Write, Connection = "constr")] Stream outputBlob,ILogger log)
        {
            log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");
            PdfReader reader = new PdfReader(myBlob);
            int intPageNum = reader.NumberOfPages;
            string[] words;
            string line;
            string text;

            for (int i = 1; i <= intPageNum; i++)
            {
                text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
                using (var writer = new StreamWriter(outputBlob))
                {
                    writer.Write(text);
                }
            }
        }
    }
}

Results:

Azure Functions Python：BlobTrigger 在流为空时启动。

the extracted text triggers a function that polishes the result and writes a third file in the container

Function2.cs - Reading the uploaded text file and doing some manipulations and saving it to another container

using System.IO;
using System.Text;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;

namespace FunctionApp5
{
    public class Function2
    {
        [FunctionName("Function2")]
        public void Run([BlobTrigger("textcontainer/{name}", Connection = "constr")] Stream myBlob, string name,
            [Blob("finalcontainer/1.txt", FileAccess.Write, Connection = "constr")] Stream outputBlob, ILogger log)
        {
            log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");

            string finalValue;

            using (var reader = new StreamReader(myBlob, Encoding.UTF8))
            {
                finalValue = reader.ReadLine();
            }

            using (var writer = new StreamWriter(outputBlob))
            {
                writer.Write(finalValue.ToUpper());
            }
        }
    }
}

Results:

Azure Functions Python：BlobTrigger 在流为空时启动。

Azure Functions Python：BlobTrigger 在流为空时启动。

英文:

After reproducing from my end, this was working fine when I followed the below process. I have created 2 Blob Trigger functions where one is to convert pdf to txt file and the other is to read the txt file, do the required manipulations and then saving to another container.

> pdf triggers a function that extracts the text and saves it as txt in the container

Function1.cs - Reading the pdf file and saving it as text file in a container

Azure Functions Python：BlobTrigger 在流为空时启动。

using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

namespace FunctionApp5
{
    public class Function1
    {
        [FunctionName(&quot;Function1&quot;)]
        public void Run([BlobTrigger(&quot;pdfcontainer/{name}&quot;, Connection = &quot;constr&quot;)]Stream myBlob, string name,
            [Blob(&quot;textcontainer/1.txt&quot;, FileAccess.Write, Connection = &quot;constr&quot;)] Stream outputBlob,ILogger log)
        {
            log.LogInformation($&quot;C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes&quot;);
            PdfReader reader = new PdfReader(myBlob);
            int intPageNum = reader.NumberOfPages;
            string[] words;
            string line;
            string text;

            for (int i = 1; i &lt;= intPageNum; i++)
            {
                text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
                using (var writer = new StreamWriter(outputBlob))
                {
                    writer.Write(text);
                }
            }
        }
    }
}

Results:

Azure Functions Python：BlobTrigger 在流为空时启动。

> the extracted text triggers a function that polishes the result and write a third file in the container

Function2.cs - Reading the uploaded text file and doing some manipulations and saving it to another container

using System.IO;
using System.Text;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;

namespace FunctionApp5
{
    public class Function2
    {
        [FunctionName(&quot;Function2&quot;)]
        public void Run([BlobTrigger(&quot;textcontainer/{name}&quot;, Connection = &quot;constr&quot;)] Stream myBlob, string name,
            [Blob(&quot;finalcontainer/1.txt&quot;, FileAccess.Write, Connection = &quot;constr&quot;)] Stream outputBlob, ILogger log)
        {
            log.LogInformation($&quot;C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes&quot;);

            string finalValue;

            using (var reader = new StreamReader(myBlob, Encoding.UTF8))
            {
                finalValue = reader.ReadLine();
            }

            using (var writer = new StreamWriter(outputBlob))
            {
                writer.Write(finalValue.ToUpper());
            }
        }
    }
}

Results:

Azure Functions Python：BlobTrigger 在流为空时启动。

Azure Functions Python：BlobTrigger 在流为空时启动。

答案2

得分: 0

The problem seems to be the time needed by the Azure function to execute.

As soon as the function is invoked, the output binding creates the file but just at the end of the execution writes the data (which makes sense).

On the other side, if the output file of this function triggers another azure function, this second pipeline stages can get an empty file or not, depending on the execution status of the first one. (I hope I could explain myself a bit)

My solution was not using the output blob binding. If I use the ContainerClient binding I can upload a file all at once when the elaboration of my data is done.

英文:

The problem seems to be the time needed by the Azure function to execute.

As soon as the function is invoked, the output binding creates the file but just at the end of the execution writes the data (which makes sense).

On the other side, if the output file of this function triggers another azure function, this second pipeline stages can get an empty file or not, depending on the execution status of the first one. (i hope i could explain myself a bit)

My solution was not using the output blob binding. If I use the ContainerClient binding I can upload a file all at once when the elaboration of my data is done.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

本文由 huangapple 发表于 2023年4月17日 03:30:11
转载请务必保留本文链接：https://go.coder-hub.com/76029932.html

azure
azure-blob-trigger
azure-functions

Azure 无法从 Terraform 中的角色获取 UUID。

go 161 03/03

文档和标头之间存在不匹配的CosmosDB分区键值错误，尽管这些值是匹配的。

go 223 02/24

如何为FoxIDs控制Web应用程序设置自定义域名。

go 158 07/03

验证/测试输出数据与原始数据的最佳方法

go 171 03/12

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开