英文:
How to get an alarm when there are no logs for a time period in AWS Cloudwatch?
问题
我有一个在AWS Elastic Container Service中运行的Java应用程序。该应用程序定期轮询队列。有时队列没有响应,应用程序会永远挂起。
我已经使用try-catch块包装了方法,并记录了异常。即使在此之后,CloudWatch中也没有日志。没有异常或错误。
有没有办法可以识别这种情况?(CloudWatch中没有日志)。就像筛选错误日志模式一样。
这样我就可以重新启动服务。任何技巧或解决方案将不胜感激。
public void handleProcess() {
try {
while(true) {
Response response = QueueUitils.pollQueue(); // 轮询队列
QueueUitils.processMessage(response);
TimeUnit.SECONDS.sleep(WAIT_TIME); // WAIT_TIME = 20
}
} catch (Exception e) {
LOGGER.error("数据队列操作失败:" + e.getMessage());
throw e;
}
}
英文:
I have a Java application that runs in AWS Elastic Container Service. Application polls a queue periodically. Sometimes there is no response from the queue and the application hanging forever.
I have enclosed the methods with try-catch blocks with logging exceptions. Even though there are no logs in the Cloudwatch after that. No exceptions or errors.
Is there a way that I can identify this situation. ? (No logs in the Cloudwatch). Like filtering an error log pattern.
So I can restart the service. Any trick or solution would be appreciated.
public void handleProcess() {
try {
while(true) {
Response response = QueueUitils.pollQueue(); // poll the queue
QueueUitils.processMessage(response);
TimeUnit.SECONDS.sleep(WAIT_TIME); // WAIT_TIME = 20
}
} catch (Exception e) {
LOGGER.error("Data Queue operation failed" + e.getMessage());
throw e;
}
}
答案1
得分: 9
你可以使用 CloudWatch Alarms 来实现这个功能。我已经为此设置了一个测试 Lambda 函数,它每分钟运行一次并记录到 CloudWatch 中。
- 进入 CloudWatch,在左侧菜单中点击“Alarms”。
- 点击橙色的“Create Alarm”按钮。
- 点击“Select Metric”。
- 然后选择“Logs”,再选择“Log Group Metrics”,为相关日志组(您的应用程序记录日志的日志组)选择
IncomingLogEvents
指标。在我的情况下,它是/aws/lambda/test-log-silence
。 - 点击Select Metric。
- 现在您可以指定如何测量指标。我选择了过去 5 分钟内的平均日志条目数,因此在 5 分钟后如果没有日志条目,该值将为零。
- 向下滚动,将检查设置为“小于或等于”零。这将在连续 5 分钟没有日志条目时触发警报(或者您决定设置的时间)。
- 现在点击下一步,您可以指定一个 SNS 主题来推送通知。您可以设置一个 SNS 主题,以通过电子邮件、短信、AWS Lambda 和其他方式通知您。
英文:
You can do this with CloudWatch Alarms. I've set up a test Lambda function for this which runs every minute and logs to CloudWatch.
- Go to CloudWatch and Click Alarms in the left hand side menu
- Click the orange Create Alarm button
- Click Select Metric
- Then choose Logs, then Log Group Metrics and choose the
IncomingLogEvents
metric for the relevant log group (the log group to which your application is logging). In my case it's/aws/lambda/test-log-silence
- Click Select Metric
- Now you can specify how you want to measure the metric. I've chosen the average log entries over 5 minutes, so after 5 minutes if there are no log entries, that value would be zero.
- Scroll down, and you set the check to be "Lower Than or Equal To" zero. This will trigger the alarm when there are no log entries for 5 minutes (or whatever you decide to set it to).
- Now click next, and you can specify an SNS topic to push the notification to. You can set up an SNS topic to notify you via email, SMS, AWS Lambda, and others.
答案2
得分: 4
参考brads3290的回答,如果您正在使用AWS CDK:
import * as cloudwatch from '@aws-cdk/aws-cloudwatch';
// ...
const metric = new cloudwatch.Metric({
namespace: 'AWS/Logs',
metricName: 'IncomingLogEvents',
dimensions: { LogGroupName: '/aws/lambda/test-log-silence' },
statistic: "Average",
period: cdk.Duration.minutes(5),
});
const alarm = new cloudwatch.Alarm(this, 'Alarm', {
metric,
threshold: 0,
comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_OR_EQUAL_TO_THRESHOLD,
evaluationPeriods: 1,
datapointsToAlarm: 1,
treatMissingData: cloudwatch.TreatMissingData.BREACHING,
});
这也应该解决忽略缺失数据的问题。
英文:
With reference to brads3290's answer, if you are using AWS CDK:
import * as cloudwatch from '@aws-cdk/aws-cloudwatch';
// ...
const metric = new cloudwatch.Metric({
namespace: 'AWS/Logs',
metricName: 'IncomingLogEvents',
dimensions: { LogGroupName: '/aws/lambda/test-log-silence' },
statistic: "Average",
period: cdk.Duration.minutes(5),
});
const alarm = new cloudwatch.Alarm(this, 'Alarm', {
metric,
threshold: 0,
comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_OR_EQUAL_TO_THRESHOLD,
evaluationPeriods: 1,
datapointsToAlarm: 1,
treatMissingData: cloudwatch.TreatMissingData.BREACHING,
});
This should also solve the problem of ignoring missing data.
答案3
得分: 0
在我的情况下,我需要使用dimensionsMap{}而不仅仅是dimensions: {}
const metric = new cloudwatch.Metric({
namespace: 'AWS/Logs',
metricName: 'IncomingLogEvents',
dimensionsMap: {
"LogGroupName": "logGroupNamehere.."
},
statistic: "Sum",
period: cdk.Duration.days(1),
});
而且警报看起来像是:
new cloudwatch.Alarm(this, 'no-incoming-logs-alarm', {
metric,
alarmName: incoming-logs-alarm-${props?.stage}
,
threshold: 1,
comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
evaluationPeriods: 1,
datapointsToAlarm: 1,
treatMissingData: cloudwatch.TreatMissingData.MISSING,
alarmDescription: '一些有意义的描述',
});
英文:
In my case, I needed to use dimensionsMap{} instead of just dimensions: {}
const metric = new cloudwatch.Metric({
namespace: 'AWS/Logs',
metricName: 'IncomingLogEvents',
dimensionsMap: {
"LogGroupName": "logGroupNamehere.."
},
statistic: "Sum",
period: cdk.Duration.days(1),
});
And the Alarm looks like:
new cloudwatch.Alarm(this, 'no-incoming-logs-alarm', {
metric,
alarmName: `incoming-logs-alarm-${props?.stage}`,
threshold: 1,
comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
evaluationPeriods: 1,
datapointsToAlarm: 1,
treatMissingData: cloudwatch.TreatMissingData.MISSING,
alarmDescription: 'Some meaningful description',
});
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论