Azure Function由队列触发,经常超时。

huangapple go评论99阅读模式
英文:

Azure Function triggered by Queue times out regularly

问题

I have a linux Azure Function that is triggered by a Queue. Most messages that the Function attempts to process end up in the 'poison' queue and the only logs I can see indicate that it timed out.

  1. 2023-06-12 20:00:14.132 Executing 'Functions.Drone' (Reason='New queue message detected on 'drone-input-queue'.', Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc) Information
  2. 2023-06-12 20:00:14.133 Trigger Details: MessageId: 3c54e04a-3d96-4ab9-ab69-3101f77a16f1, DequeueCount: 1, InsertionTime: 2023-06-12T20:00:10.000+00:00 Information
  3. 2023-06-12 20:10:14.133 Timeout value of 00:10:00 exceeded by function 'Functions.Drone' (Id: '6e66d069-0b09-4966-92dd-d63a8f9aa3fc'). Initiating cancellation. Error
  4. 2023-06-12 20:10:14.133 Executed '{functionName}' ({status}, Id={invocationId}, Duration={executionDuration}ms) Error
  5. 2023-06-12 20:10:14.133 Executed 'Functions.Drone' (Failed, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc, Duration=600000ms) Error
  6. 2023-06-12 20:10:14.133 Error

Potentially relevant information:

  • The Function App is using a Consumption plan. I know there can be cold starts, but I see this happen, like, 90% of the time, even when messages are coming in every few minutes.
  • The Function App and Queue are set up using an ARM template. The Function is later deployed from an archive.
  • The Function is using Linux as the OS type and is running Java code.
  • I do occasionally see proper execution, so 'AzureWebJobsStorage', 'STORAGE_QUEUE_CONNECTION_STRING', and 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING' should all be set correctly.
  • I've double and triple checked the host.json and function.json, but here they are, for reference:
    host.json
  1. {
  2. "version": "2.0",
  3. "functionTimeout": "00:10:00",
  4. "extensionBundle": {
  5. "id": "Microsoft.Azure.Functions.ExtensionBundle",
  6. "version": "[2.*, 3.0.0)"
  7. },
  8. "extensions": {
  9. "queues": {
  10. "maxPollingInterval": "00:00:02",
  11. "visibilityTimeout" : "00:00:30",
  12. "batchSize": 16,
  13. "maxDequeueCount": 3,
  14. "newBatchThreshold": 8
  15. }
  16. }
  17. }

function.json

  1. {
  2. "scriptFile" : "../my.package.drone.jar",
  3. "entryPoint" : "my.package.Drone.run",
  4. "bindings": [
  5. {
  6. "name": "message",
  7. "type": "queueTrigger",
  8. "direction": "in",
  9. "queueName": "catapult-drone-input-queue"
  10. },
  11. {
  12. "name": "output",
  13. "type": "queue",
  14. "direction": "out",
  15. "queueName": "catapult-drone-output-queue"
  16. }
  17. ]
  18. }

I was initially deploying the Function with an outdated version of the CLI. While updating that fixed some issued, it did not fix this one. Here is the version info.

  1. # az version
  2. {
  3. "azure-cli": "2.39.0",
  4. "azure-cli-core": "2.39.0",
  5. "azure-cli-telemetry": "1.0.6",
  6. "extensions": {}
  7. }
  1. # func
  2. ...
  3. Azure Functions Core Tools
  4. Core Tools Version: 3.0.4899 Commit hash: N/A (64-bit)
  5. Function Runtime Version: 3.17.0.0
  6. ...

I suspect it might have something to do with the scaling configuration, but am at a loss for how to configure that. Any advice would be appreciated.

Here is a snippet of the entrypoint code:

  1. @FunctionName("Drone")
  2. public void run(@QueueTrigger(name = "message", queueName = "drone-input-queue") String message,
  3. @QueueOutput(name = "output", queueName = "drone-output-queue") OutputBinding<String> output,
  4. final ExecutionContext context) {
  5. String id = UUID.randomUUID().toString();
  6. logger.setContext(context); //custom tool for logging to context logger and/or blob
  7. logger.info("[" + id + "] Start");
  8. encryption = initializeInFlightEncryptionClient(logger);
  9. logger.info("[" + id + "] encryption initialized");
  10. jsonSigner = initializeJsonSigningUtility(logger);
  11. logger.info("[" + id + "] jsonSigner initialized");
  12. try {
  13. // ... logic to handle message ...
  14. }
  15. catch(Exception e) {
  16. logger.severe("ERROR: " + e.getMessage(), e);
  17. }
  18. finally {
  19. logger.closeLogFile();
  20. }
  21. }

I forgot to mention that I have fairly extensive logging in this part of the code, so I know that, at the very least, it isn't getting past 'logger.setContext', which only sets a private variable to context.getLogger(). That's why I didn't bother sharing code earlier.
Also, I have seen messages that take only 30 seconds to process cause this timeout. If I send like 10 of them at once, the first may get processed correctly and quickly, but the next 9 tend to all have this timeout. That's why I suspect something wonky with the configuration.

P.S. This is the ONLY Function in the Function App.

英文:

I have a linux Azure Function that is triggered by a Queue. Most messages that the Function attempts to process end up in the 'poison' queue and the only logs I can see indicate that it timed out.

  1. 2023-06-12 20:00:14.132 Executing &#39;Functions.Drone&#39; (Reason=&#39;New queue message detected on &#39;drone-input-queue&#39;.&#39;, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc) Information
  2. 2023-06-12 20:00:14.133 Trigger Details: MessageId: 3c54e04a-3d96-4ab9-ab69-3101f77a16f1, DequeueCount: 1, InsertionTime: 2023-06-12T20:00:10.000+00:00 Information
  3. 2023-06-12 20:10:14.133 Timeout value of 00:10:00 exceeded by function &#39;Functions.Drone&#39; (Id: &#39;6e66d069-0b09-4966-92dd-d63a8f9aa3fc&#39;). Initiating cancellation. Error
  4. 2023-06-12 20:10:14.133 Executed &#39;{functionName}&#39; ({status}, Id={invocationId}, Duration={executionDuration}ms) Error
  5. 2023-06-12 20:10:14.133 Executed &#39;Functions.Drone&#39; (Failed, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc, Duration=600000ms) Error
  6. 2023-06-12 20:10:14.133 Error

Potentially relevant information:

  • The Function App is using a Consumption plan. I know there can be cold starts, but I see this happen, like, 90% of the time, even when messages are coming in every few minutes.
  • The Function App and Queue are set up using an ARM template. The Function is later deployed from an archive.
  • The Function is using Linux as the OS type and is running Java code.
  • I do occasionally see proper execution, so 'AzureWebJobsStorage', 'STORAGE_QUEUE_CONNECTION_STRING', and 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING' should all be set correctly.
  • I've double and triple checked the host.json and function.json, but here they are, for reference:
    host.json
  1. {
  2. &quot;version&quot;: &quot;2.0&quot;,
  3. &quot;functionTimeout&quot;: &quot;00:10:00&quot;,
  4. &quot;extensionBundle&quot;: {
  5. &quot;id&quot;: &quot;Microsoft.Azure.Functions.ExtensionBundle&quot;,
  6. &quot;version&quot;: &quot;[2.*, 3.0.0)&quot;
  7. },
  8. &quot;extensions&quot;: {
  9. &quot;queues&quot;: {
  10. &quot;maxPollingInterval&quot;: &quot;00:00:02&quot;,
  11. &quot;visibilityTimeout&quot; : &quot;00:00:30&quot;,
  12. &quot;batchSize&quot;: 16,
  13. &quot;maxDequeueCount&quot;: 3,
  14. &quot;newBatchThreshold&quot;: 8
  15. }
  16. }
  17. }

function.json

  1. {
  2. &quot;scriptFile&quot; : &quot;../my.package.drone.jar&quot;,
  3. &quot;entryPoint&quot; : &quot;my.package.Drone.run&quot;,
  4. &quot;bindings&quot;: [
  5. {
  6. &quot;name&quot;: &quot;message&quot;,
  7. &quot;type&quot;: &quot;queueTrigger&quot;,
  8. &quot;direction&quot;: &quot;in&quot;,
  9. &quot;queueName&quot;: &quot;catapult-drone-input-queue&quot;
  10. },
  11. {
  12. &quot;name&quot;: &quot;output&quot;,
  13. &quot;type&quot;: &quot;queue&quot;,
  14. &quot;direction&quot;: &quot;out&quot;,
  15. &quot;queueName&quot;: &quot;catapult-drone-output-queue&quot;
  16. }
  17. ]
  18. }
  • I was initially deploying the Function with an outdated version of the CLI. While updating that fixed some issued, it did not fix this one. Here is the version info.
  1. # az version
  2. {
  3. &quot;azure-cli&quot;: &quot;2.39.0&quot;,
  4. &quot;azure-cli-core&quot;: &quot;2.39.0&quot;,
  5. &quot;azure-cli-telemetry&quot;: &quot;1.0.6&quot;,
  6. &quot;extensions&quot;: {}
  7. }
  1. # func
  2. ...
  3. Azure Functions Core Tools
  4. Core Tools Version: 3.0.4899 Commit hash: N/A (64-bit)
  5. Function Runtime Version: 3.17.0.0
  6. ...

I suspect it might have something to do with the scaling configuration, but am at a loss for how to configure that. Any advice would be appreciated.


Here is a snippet of the entrypoint code:

  1. @FunctionName(&quot;Drone&quot;)
  2. public void run(@QueueTrigger(name = &quot;message&quot;, queueName = &quot;drone-input-queue&quot;) String message,
  3. @QueueOutput(name = &quot;output&quot;, queueName = &quot;drone-output-queue&quot;) OutputBinding&lt;String&gt; output,
  4. final ExecutionContext context) {
  5. String id = UUID.randomUUID().toString();
  6. logger.setContext(context); //custom tool for logging to context logger and/or blob
  7. logger.info(&quot;[&quot;+id+&quot;] Start&quot;);
  8. encryption = initializeInFlightEncryptionClient(logger);
  9. logger.info(&quot;[&quot;+id+&quot;] encryption initialized&quot;);
  10. jsonSigner = initializeJsonSigningUtility(logger);
  11. logger.info(&quot;[&quot;+id+&quot;] jsonSigner initialized&quot;);
  12. try {
  13. // ... logic to handle message ...
  14. }
  15. catch(Exception e) {
  16. logger.severe(&quot;ERROR: &quot;+e.getMessage(), e);
  17. }
  18. finally {
  19. logger.closeLogFile();
  20. }
  21. }

I forgot to mention that I have fairly extensive logging in this part of the code, so I know that, at the very least, it isn't getting past 'logger.setContext', which only sets a private variable to context.getLogger(). That's why I didn't bother sharing code earlier.
Also, I have seen messsages that take only 30 seconds to process cause this timeout. If I send like 10 of them at once, the first may get processed correctly and quickly, but the next 9 tend to all have this timeout. That's why I suspect something wonky with the configuration.

P.S. This is the ONLY Function in the Function App.

答案1

得分: 0

你的函数执行时间超过了10分钟。在本地运行时,你可以将超时时间设置为更大的值。

如果执行时间太长,你有以下选择:

  • 应用计划(无限制)
  • 将其拆分成较小的部分
  • 使用可靠函数
英文:

Your function is talking longer than 10 minutes to complete. Ho wlong does it talk to complete locally? When run locally, you can set the timeout to a larger value.

If it just takes too long, your options are:

  • App plan (no limits)
  • Break it up into smaller pieces
  • Use durable functions

答案2

得分: 0

经过大量研究,并根据评论和其他答案提供的一些线索,我发现将FUNCTIONS_WORKER_PROCESS_COUNT = 2设置在应用设置中,几乎完全消除了这个问题 - 到了一个不再是问题的程度。无论这是否是一个好的解决方案,我不知道,但这是我的目前结果。

英文:

After much research and following some of the leads provided by the comments and other answers, I found setting FUNCTIONS_WORKER_PROCESS_COUNT = 2 in the App Settings resulted in the issue almost completely disappearing - to the point it wasn't an issue anymore. Whether this is a good solution or not, I don't know, but that is my results thus far.

huangapple
  • 本文由 发表于 2023年6月13日 04:41:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460188.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定