英文:
Azure Functions python: BlobTrigger starts when stream is empty
问题
以下是翻译好的部分:
我正在尝试实现一个流水线,每个步骤都会在 Blob 容器中写入一个文件。
步骤:
- PDF 触发一个函数,提取文本并将其保存为 txt 文件在容器中。
- 提取的文本触发一个函数,对结果进行优化并在容器中写入第三个文件。
第二步有时(大多数情况下)会失败,因为由第一步生成的 txt 文件的流看起来是空的(实际上不是)。
我没有预算,所以无法在 Azure 中使用服务总线或事件中心。我认为触发器是通过轮询启动的。
有人有办法解决这个问题或我应该从哪里开始查找吗?
英文:
I am trying to implement a pipeline that each step writes a file in a blob container.
Steps:
- pdf triggers a function that extracts the text and saves it as txt in the container
- the extracted text triggers a function that polishes the result and write a third file in the container
The second step sometime (most of them) fails because the stream of the txt file generated by the first step looks empty (it's not).
I don't have any budget so i can't use service bus or event hubs in Azure. I think that the triggers starts via polling.
Anyone has an idea on how can i solve this problem or where i should start to look?
答案1
得分: 1
After reproducing from my end, this was working fine when I followed the below process. I have created 2 Blob Trigger functions where one is to convert pdf to txt file and the other is to read the txt file, do the required manipulations, and then save it to another container.
pdf triggers a function that extracts the text and saves it as txt in the container
Function1.cs - Reading the pdf file and saving it as a text file in a container
using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace FunctionApp5
{
public class Function1
{
[FunctionName("Function1")]
public void Run([BlobTrigger("pdfcontainer/{name}", Connection = "constr")]Stream myBlob, string name,
[Blob("textcontainer/1.txt", FileAccess.Write, Connection = "constr")] Stream outputBlob,ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");
PdfReader reader = new PdfReader(myBlob);
int intPageNum = reader.NumberOfPages;
string[] words;
string line;
string text;
for (int i = 1; i <= intPageNum; i++)
{
text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
using (var writer = new StreamWriter(outputBlob))
{
writer.Write(text);
}
}
}
}
}
Results:
the extracted text triggers a function that polishes the result and writes a third file in the container
Function2.cs - Reading the uploaded text file and doing some manipulations and saving it to another container
using System.IO;
using System.Text;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
namespace FunctionApp5
{
public class Function2
{
[FunctionName("Function2")]
public void Run([BlobTrigger("textcontainer/{name}", Connection = "constr")] Stream myBlob, string name,
[Blob("finalcontainer/1.txt", FileAccess.Write, Connection = "constr")] Stream outputBlob, ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");
string finalValue;
using (var reader = new StreamReader(myBlob, Encoding.UTF8))
{
finalValue = reader.ReadLine();
}
using (var writer = new StreamWriter(outputBlob))
{
writer.Write(finalValue.ToUpper());
}
}
}
}
Results:
英文:
After reproducing from my end, this was working fine when I followed the below process. I have created 2 Blob Trigger functions where one is to convert pdf to txt file and the other is to read the txt file, do the required manipulations and then saving to another container.
> pdf triggers a function that extracts the text and saves it as txt in the container
Function1.cs - Reading the pdf file and saving it as text file in a container
using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace FunctionApp5
{
public class Function1
{
[FunctionName("Function1")]
public void Run([BlobTrigger("pdfcontainer/{name}", Connection = "constr")]Stream myBlob, string name,
[Blob("textcontainer/1.txt", FileAccess.Write, Connection = "constr")] Stream outputBlob,ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");
PdfReader reader = new PdfReader(myBlob);
int intPageNum = reader.NumberOfPages;
string[] words;
string line;
string text;
for (int i = 1; i <= intPageNum; i++)
{
text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
using (var writer = new StreamWriter(outputBlob))
{
writer.Write(text);
}
}
}
}
}
Results:
> the extracted text triggers a function that polishes the result and write a third file in the container
Function2.cs - Reading the uploaded text file and doing some manipulations and saving it to another container
using System.IO;
using System.Text;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
namespace FunctionApp5
{
public class Function2
{
[FunctionName("Function2")]
public void Run([BlobTrigger("textcontainer/{name}", Connection = "constr")] Stream myBlob, string name,
[Blob("finalcontainer/1.txt", FileAccess.Write, Connection = "constr")] Stream outputBlob, ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");
string finalValue;
using (var reader = new StreamReader(myBlob, Encoding.UTF8))
{
finalValue = reader.ReadLine();
}
using (var writer = new StreamWriter(outputBlob))
{
writer.Write(finalValue.ToUpper());
}
}
}
}
Results:
答案2
得分: 0
The problem seems to be the time needed by the Azure function to execute.
As soon as the function is invoked, the output binding creates the file but just at the end of the execution writes the data (which makes sense).
On the other side, if the output file of this function triggers another azure function, this second pipeline stages can get an empty file or not, depending on the execution status of the first one. (I hope I could explain myself a bit)
My solution was not using the output blob binding. If I use the ContainerClient binding I can upload a file all at once when the elaboration of my data is done.
英文:
The problem seems to be the time needed by the Azure function to execute.
As soon as the function is invoked, the output binding creates the file but just at the end of the execution writes the data (which makes sense).
On the other side, if the output file of this function triggers another azure function, this second pipeline stages can get an empty file or not, depending on the execution status of the first one. (i hope i could explain myself a bit)
My solution was not using the output blob binding. If I use the ContainerClient binding I can upload a file all at once when the elaboration of my data is done.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论