Azure Function由队列触发,经常超时。

huangapple go评论64阅读模式
英文:

Azure Function triggered by Queue times out regularly

问题

I have a linux Azure Function that is triggered by a Queue. Most messages that the Function attempts to process end up in the 'poison' queue and the only logs I can see indicate that it timed out.

2023-06-12 20:00:14.132  Executing 'Functions.Drone' (Reason='New queue message detected on 'drone-input-queue'.', Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc)  Information
2023-06-12 20:00:14.133  Trigger Details: MessageId: 3c54e04a-3d96-4ab9-ab69-3101f77a16f1, DequeueCount: 1, InsertionTime: 2023-06-12T20:00:10.000+00:00  Information
2023-06-12 20:10:14.133  Timeout value of 00:10:00 exceeded by function 'Functions.Drone' (Id: '6e66d069-0b09-4966-92dd-d63a8f9aa3fc'). Initiating cancellation.  Error
2023-06-12 20:10:14.133  Executed '{functionName}' ({status}, Id={invocationId}, Duration={executionDuration}ms)  Error
2023-06-12 20:10:14.133  Executed 'Functions.Drone' (Failed, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc, Duration=600000ms)  Error
2023-06-12 20:10:14.133  Error

Potentially relevant information:

  • The Function App is using a Consumption plan. I know there can be cold starts, but I see this happen, like, 90% of the time, even when messages are coming in every few minutes.
  • The Function App and Queue are set up using an ARM template. The Function is later deployed from an archive.
  • The Function is using Linux as the OS type and is running Java code.
  • I do occasionally see proper execution, so 'AzureWebJobsStorage', 'STORAGE_QUEUE_CONNECTION_STRING', and 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING' should all be set correctly.
  • I've double and triple checked the host.json and function.json, but here they are, for reference:
    host.json
{
  "version": "2.0",
  "functionTimeout": "00:10:00",
  "extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[2.*, 3.0.0)"
  },
  "extensions": {
    "queues": {
      "maxPollingInterval": "00:00:02",
      "visibilityTimeout" : "00:00:30",
      "batchSize": 16,
      "maxDequeueCount": 3,
      "newBatchThreshold": 8
    }
  }
}

function.json

{
  "scriptFile" : "../my.package.drone.jar",
  "entryPoint" : "my.package.Drone.run",
  "bindings": [
    {
      "name": "message",
      "type": "queueTrigger",
      "direction": "in",
      "queueName": "catapult-drone-input-queue"
    },
    {
      "name": "output",
      "type": "queue",
      "direction": "out",
      "queueName": "catapult-drone-output-queue"
    }
  ]
}

I was initially deploying the Function with an outdated version of the CLI. While updating that fixed some issued, it did not fix this one. Here is the version info.

# az version
{
  "azure-cli": "2.39.0",
  "azure-cli-core": "2.39.0",
  "azure-cli-telemetry": "1.0.6",
  "extensions": {}
}
# func
...
Azure Functions Core Tools
Core Tools Version:       3.0.4899 Commit hash: N/A  (64-bit)
Function Runtime Version: 3.17.0.0
...

I suspect it might have something to do with the scaling configuration, but am at a loss for how to configure that. Any advice would be appreciated.

Here is a snippet of the entrypoint code:

@FunctionName("Drone")
public void run(@QueueTrigger(name = "message", queueName = "drone-input-queue") String message, 
                @QueueOutput(name = "output", queueName = "drone-output-queue") OutputBinding<String> output,
                final ExecutionContext context) {
    String id = UUID.randomUUID().toString();
    logger.setContext(context); //custom tool for logging to context logger and/or blob
    logger.info("[" + id + "] Start");
    encryption = initializeInFlightEncryptionClient(logger);
    logger.info("[" + id + "] encryption initialized");
    jsonSigner = initializeJsonSigningUtility(logger);
    logger.info("[" + id + "] jsonSigner initialized");
    try {
        // ... logic to handle message ...
    }
    catch(Exception e) {
        logger.severe("ERROR: " + e.getMessage(), e);
    }
    finally {
        logger.closeLogFile();
    }
}

I forgot to mention that I have fairly extensive logging in this part of the code, so I know that, at the very least, it isn't getting past 'logger.setContext', which only sets a private variable to context.getLogger(). That's why I didn't bother sharing code earlier.
Also, I have seen messages that take only 30 seconds to process cause this timeout. If I send like 10 of them at once, the first may get processed correctly and quickly, but the next 9 tend to all have this timeout. That's why I suspect something wonky with the configuration.

P.S. This is the ONLY Function in the Function App.

英文:

I have a linux Azure Function that is triggered by a Queue. Most messages that the Function attempts to process end up in the 'poison' queue and the only logs I can see indicate that it timed out.

2023-06-12 20:00:14.132  Executing &#39;Functions.Drone&#39; (Reason=&#39;New queue message detected on &#39;drone-input-queue&#39;.&#39;, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc)  Information
2023-06-12 20:00:14.133  Trigger Details: MessageId: 3c54e04a-3d96-4ab9-ab69-3101f77a16f1, DequeueCount: 1, InsertionTime: 2023-06-12T20:00:10.000+00:00  Information
2023-06-12 20:10:14.133  Timeout value of 00:10:00 exceeded by function &#39;Functions.Drone&#39; (Id: &#39;6e66d069-0b09-4966-92dd-d63a8f9aa3fc&#39;). Initiating cancellation.  Error
2023-06-12 20:10:14.133  Executed &#39;{functionName}&#39; ({status}, Id={invocationId}, Duration={executionDuration}ms)  Error
2023-06-12 20:10:14.133  Executed &#39;Functions.Drone&#39; (Failed, Id=6e66d069-0b09-4966-92dd-d63a8f9aa3fc, Duration=600000ms)  Error
2023-06-12 20:10:14.133  Error

Potentially relevant information:

  • The Function App is using a Consumption plan. I know there can be cold starts, but I see this happen, like, 90% of the time, even when messages are coming in every few minutes.
  • The Function App and Queue are set up using an ARM template. The Function is later deployed from an archive.
  • The Function is using Linux as the OS type and is running Java code.
  • I do occasionally see proper execution, so 'AzureWebJobsStorage', 'STORAGE_QUEUE_CONNECTION_STRING', and 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING' should all be set correctly.
  • I've double and triple checked the host.json and function.json, but here they are, for reference:
    host.json
{
  &quot;version&quot;: &quot;2.0&quot;,
  &quot;functionTimeout&quot;: &quot;00:10:00&quot;,
  &quot;extensionBundle&quot;: {
    &quot;id&quot;: &quot;Microsoft.Azure.Functions.ExtensionBundle&quot;,
    &quot;version&quot;: &quot;[2.*, 3.0.0)&quot;
  },
  &quot;extensions&quot;: {
    &quot;queues&quot;: {
      &quot;maxPollingInterval&quot;: &quot;00:00:02&quot;,
      &quot;visibilityTimeout&quot; : &quot;00:00:30&quot;,
      &quot;batchSize&quot;: 16,
      &quot;maxDequeueCount&quot;: 3,
      &quot;newBatchThreshold&quot;: 8
    }
  }
}

function.json

{
  &quot;scriptFile&quot; : &quot;../my.package.drone.jar&quot;,
  &quot;entryPoint&quot; : &quot;my.package.Drone.run&quot;,
  &quot;bindings&quot;: [
    {
      &quot;name&quot;: &quot;message&quot;,
      &quot;type&quot;: &quot;queueTrigger&quot;,
      &quot;direction&quot;: &quot;in&quot;,
      &quot;queueName&quot;: &quot;catapult-drone-input-queue&quot;
    },
    {
      &quot;name&quot;: &quot;output&quot;,
      &quot;type&quot;: &quot;queue&quot;,
      &quot;direction&quot;: &quot;out&quot;,
      &quot;queueName&quot;: &quot;catapult-drone-output-queue&quot;
    }
  ]
}
  • I was initially deploying the Function with an outdated version of the CLI. While updating that fixed some issued, it did not fix this one. Here is the version info.
# az version
{
  &quot;azure-cli&quot;: &quot;2.39.0&quot;,
  &quot;azure-cli-core&quot;: &quot;2.39.0&quot;,
  &quot;azure-cli-telemetry&quot;: &quot;1.0.6&quot;,
  &quot;extensions&quot;: {}
}
# func
...
Azure Functions Core Tools
Core Tools Version:       3.0.4899 Commit hash: N/A  (64-bit)
Function Runtime Version: 3.17.0.0
...

I suspect it might have something to do with the scaling configuration, but am at a loss for how to configure that. Any advice would be appreciated.


Here is a snippet of the entrypoint code:

@FunctionName(&quot;Drone&quot;)
public void run(@QueueTrigger(name = &quot;message&quot;, queueName = &quot;drone-input-queue&quot;) String message, 
                @QueueOutput(name = &quot;output&quot;, queueName = &quot;drone-output-queue&quot;) OutputBinding&lt;String&gt; output,
                final ExecutionContext context) {
    String id = UUID.randomUUID().toString();
    logger.setContext(context); //custom tool for logging to context logger and/or blob
    logger.info(&quot;[&quot;+id+&quot;] Start&quot;);
    encryption = initializeInFlightEncryptionClient(logger);
    logger.info(&quot;[&quot;+id+&quot;] encryption initialized&quot;);
    jsonSigner = initializeJsonSigningUtility(logger);
    logger.info(&quot;[&quot;+id+&quot;] jsonSigner initialized&quot;);
    try {
        // ... logic to handle message ...
    }
    catch(Exception e) {
        logger.severe(&quot;ERROR: &quot;+e.getMessage(), e);
    }
    finally {
        logger.closeLogFile();
    }
}

I forgot to mention that I have fairly extensive logging in this part of the code, so I know that, at the very least, it isn't getting past 'logger.setContext', which only sets a private variable to context.getLogger(). That's why I didn't bother sharing code earlier.
Also, I have seen messsages that take only 30 seconds to process cause this timeout. If I send like 10 of them at once, the first may get processed correctly and quickly, but the next 9 tend to all have this timeout. That's why I suspect something wonky with the configuration.

P.S. This is the ONLY Function in the Function App.

答案1

得分: 0

你的函数执行时间超过了10分钟。在本地运行时,你可以将超时时间设置为更大的值。

如果执行时间太长,你有以下选择:

  • 应用计划(无限制)
  • 将其拆分成较小的部分
  • 使用可靠函数
英文:

Your function is talking longer than 10 minutes to complete. Ho wlong does it talk to complete locally? When run locally, you can set the timeout to a larger value.

If it just takes too long, your options are:

  • App plan (no limits)
  • Break it up into smaller pieces
  • Use durable functions

答案2

得分: 0

经过大量研究,并根据评论和其他答案提供的一些线索,我发现将FUNCTIONS_WORKER_PROCESS_COUNT = 2设置在应用设置中,几乎完全消除了这个问题 - 到了一个不再是问题的程度。无论这是否是一个好的解决方案,我不知道,但这是我的目前结果。

英文:

After much research and following some of the leads provided by the comments and other answers, I found setting FUNCTIONS_WORKER_PROCESS_COUNT = 2 in the App Settings resulted in the issue almost completely disappearing - to the point it wasn't an issue anymore. Whether this is a good solution or not, I don't know, but that is my results thus far.

huangapple
  • 本文由 发表于 2023年6月13日 04:41:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460188.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定