英文:
Azure Function triggered by Queue times out regularly
问题
I have a linux Azure Function that is triggered by a Queue. Most messages that the Function attempts to process end up in the 'poison' queue and the only logs I can see indicate that it timed out.
2023-06-12 20:00:14.132 Executing 'Functions.Drone' (Reason='New queue message detected on 'drone-input-queue'.', Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc) Information
2023-06-12 20:00:14.133 Trigger Details: MessageId: 3c54e04a-3d96-4ab9-ab69-3101f77a16f1, DequeueCount: 1, InsertionTime: 2023-06-12T20:00:10.000+00:00 Information
2023-06-12 20:10:14.133 Timeout value of 00:10:00 exceeded by function 'Functions.Drone' (Id: '6e66d069-0b09-4966-92dd-d63a8f9aa3fc'). Initiating cancellation. Error
2023-06-12 20:10:14.133 Executed '{functionName}' ({status}, Id={invocationId}, Duration={executionDuration}ms) Error
2023-06-12 20:10:14.133 Executed 'Functions.Drone' (Failed, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc, Duration=600000ms) Error
2023-06-12 20:10:14.133 Error
Potentially relevant information:
- The Function App is using a Consumption plan. I know there can be cold starts, but I see this happen, like, 90% of the time, even when messages are coming in every few minutes.
- The Function App and Queue are set up using an ARM template. The Function is later deployed from an archive.
- The Function is using Linux as the OS type and is running Java code.
- I do occasionally see proper execution, so 'AzureWebJobsStorage', 'STORAGE_QUEUE_CONNECTION_STRING', and 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING' should all be set correctly.
- I've double and triple checked the host.json and function.json, but here they are, for reference:
host.json
{
"version": "2.0",
"functionTimeout": "00:10:00",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"extensions": {
"queues": {
"maxPollingInterval": "00:00:02",
"visibilityTimeout" : "00:00:30",
"batchSize": 16,
"maxDequeueCount": 3,
"newBatchThreshold": 8
}
}
}
function.json
{
"scriptFile" : "../my.package.drone.jar",
"entryPoint" : "my.package.Drone.run",
"bindings": [
{
"name": "message",
"type": "queueTrigger",
"direction": "in",
"queueName": "catapult-drone-input-queue"
},
{
"name": "output",
"type": "queue",
"direction": "out",
"queueName": "catapult-drone-output-queue"
}
]
}
I was initially deploying the Function with an outdated version of the CLI. While updating that fixed some issued, it did not fix this one. Here is the version info.
# az version
{
"azure-cli": "2.39.0",
"azure-cli-core": "2.39.0",
"azure-cli-telemetry": "1.0.6",
"extensions": {}
}
# func
...
Azure Functions Core Tools
Core Tools Version: 3.0.4899 Commit hash: N/A (64-bit)
Function Runtime Version: 3.17.0.0
...
I suspect it might have something to do with the scaling configuration, but am at a loss for how to configure that. Any advice would be appreciated.
Here is a snippet of the entrypoint code:
@FunctionName("Drone")
public void run(@QueueTrigger(name = "message", queueName = "drone-input-queue") String message,
@QueueOutput(name = "output", queueName = "drone-output-queue") OutputBinding<String> output,
final ExecutionContext context) {
String id = UUID.randomUUID().toString();
logger.setContext(context); //custom tool for logging to context logger and/or blob
logger.info("[" + id + "] Start");
encryption = initializeInFlightEncryptionClient(logger);
logger.info("[" + id + "] encryption initialized");
jsonSigner = initializeJsonSigningUtility(logger);
logger.info("[" + id + "] jsonSigner initialized");
try {
// ... logic to handle message ...
}
catch(Exception e) {
logger.severe("ERROR: " + e.getMessage(), e);
}
finally {
logger.closeLogFile();
}
}
I forgot to mention that I have fairly extensive logging in this part of the code, so I know that, at the very least, it isn't getting past 'logger.setContext', which only sets a private variable to context.getLogger(). That's why I didn't bother sharing code earlier.
Also, I have seen messages that take only 30 seconds to process cause this timeout. If I send like 10 of them at once, the first may get processed correctly and quickly, but the next 9 tend to all have this timeout. That's why I suspect something wonky with the configuration.
P.S. This is the ONLY Function in the Function App.
英文:
I have a linux Azure Function that is triggered by a Queue. Most messages that the Function attempts to process end up in the 'poison' queue and the only logs I can see indicate that it timed out.
2023-06-12 20:00:14.132 Executing 'Functions.Drone' (Reason='New queue message detected on 'drone-input-queue'.', Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc) Information
2023-06-12 20:00:14.133 Trigger Details: MessageId: 3c54e04a-3d96-4ab9-ab69-3101f77a16f1, DequeueCount: 1, InsertionTime: 2023-06-12T20:00:10.000+00:00 Information
2023-06-12 20:10:14.133 Timeout value of 00:10:00 exceeded by function 'Functions.Drone' (Id: '6e66d069-0b09-4966-92dd-d63a8f9aa3fc'). Initiating cancellation. Error
2023-06-12 20:10:14.133 Executed '{functionName}' ({status}, Id={invocationId}, Duration={executionDuration}ms) Error
2023-06-12 20:10:14.133 Executed 'Functions.Drone' (Failed, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc, Duration=600000ms) Error
2023-06-12 20:10:14.133 Error
Potentially relevant information:
- The Function App is using a Consumption plan. I know there can be cold starts, but I see this happen, like, 90% of the time, even when messages are coming in every few minutes.
- The Function App and Queue are set up using an ARM template. The Function is later deployed from an archive.
- The Function is using Linux as the OS type and is running Java code.
- I do occasionally see proper execution, so 'AzureWebJobsStorage', 'STORAGE_QUEUE_CONNECTION_STRING', and 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING' should all be set correctly.
- I've double and triple checked the host.json and function.json, but here they are, for reference:
host.json
{
"version": "2.0",
"functionTimeout": "00:10:00",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"extensions": {
"queues": {
"maxPollingInterval": "00:00:02",
"visibilityTimeout" : "00:00:30",
"batchSize": 16,
"maxDequeueCount": 3,
"newBatchThreshold": 8
}
}
}
function.json
{
"scriptFile" : "../my.package.drone.jar",
"entryPoint" : "my.package.Drone.run",
"bindings": [
{
"name": "message",
"type": "queueTrigger",
"direction": "in",
"queueName": "catapult-drone-input-queue"
},
{
"name": "output",
"type": "queue",
"direction": "out",
"queueName": "catapult-drone-output-queue"
}
]
}
- I was initially deploying the Function with an outdated version of the CLI. While updating that fixed some issued, it did not fix this one. Here is the version info.
# az version
{
"azure-cli": "2.39.0",
"azure-cli-core": "2.39.0",
"azure-cli-telemetry": "1.0.6",
"extensions": {}
}
# func
...
Azure Functions Core Tools
Core Tools Version: 3.0.4899 Commit hash: N/A (64-bit)
Function Runtime Version: 3.17.0.0
...
I suspect it might have something to do with the scaling configuration, but am at a loss for how to configure that. Any advice would be appreciated.
Here is a snippet of the entrypoint code:
@FunctionName("Drone")
public void run(@QueueTrigger(name = "message", queueName = "drone-input-queue") String message,
@QueueOutput(name = "output", queueName = "drone-output-queue") OutputBinding<String> output,
final ExecutionContext context) {
String id = UUID.randomUUID().toString();
logger.setContext(context); //custom tool for logging to context logger and/or blob
logger.info("["+id+"] Start");
encryption = initializeInFlightEncryptionClient(logger);
logger.info("["+id+"] encryption initialized");
jsonSigner = initializeJsonSigningUtility(logger);
logger.info("["+id+"] jsonSigner initialized");
try {
// ... logic to handle message ...
}
catch(Exception e) {
logger.severe("ERROR: "+e.getMessage(), e);
}
finally {
logger.closeLogFile();
}
}
I forgot to mention that I have fairly extensive logging in this part of the code, so I know that, at the very least, it isn't getting past 'logger.setContext', which only sets a private variable to context.getLogger(). That's why I didn't bother sharing code earlier.
Also, I have seen messsages that take only 30 seconds to process cause this timeout. If I send like 10 of them at once, the first may get processed correctly and quickly, but the next 9 tend to all have this timeout. That's why I suspect something wonky with the configuration.
P.S. This is the ONLY Function in the Function App.
答案1
得分: 0
你的函数执行时间超过了10分钟。在本地运行时,你可以将超时时间设置为更大的值。
如果执行时间太长,你有以下选择:
- 应用计划(无限制)
- 将其拆分成较小的部分
- 使用可靠函数
英文:
Your function is talking longer than 10 minutes to complete. Ho wlong does it talk to complete locally? When run locally, you can set the timeout to a larger value.
If it just takes too long, your options are:
- App plan (no limits)
- Break it up into smaller pieces
- Use durable functions
答案2
得分: 0
经过大量研究,并根据评论和其他答案提供的一些线索,我发现将FUNCTIONS_WORKER_PROCESS_COUNT = 2
设置在应用设置中,几乎完全消除了这个问题 - 到了一个不再是问题的程度。无论这是否是一个好的解决方案,我不知道,但这是我的目前结果。
英文:
After much research and following some of the leads provided by the comments and other answers, I found setting FUNCTIONS_WORKER_PROCESS_COUNT = 2
in the App Settings resulted in the issue almost completely disappearing - to the point it wasn't an issue anymore. Whether this is a good solution or not, I don't know, but that is my results thus far.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论