英文:
Ansible - starting a task after cpu load is below 2.0
问题
我正在尝试创建一个剧本,在 CPU 负载低于 2.0 之后执行简单的调试任务。
到目前为止,在 cpu-load.yml
文件中我有以下内容:
---
- name: 检查 CPU 负载并等待
hosts: localhost
gather_facts: yes
tasks:
- name: 检查 CPU 负载
shell: uptime | awk -F 'load average:' '{print $2}' | awk -F ', ' '{print $1}'
register: cpu_load
- name: 等待 CPU 负载低于 2.0
wait_for:
timeout: 300
delay: 10
shell: 在这里执行一些操作
msg: "CPU 负载低于 2.0"
- name: 继续任务
debug:
msg: "CPU 负载低于 2.0。继续执行!!!"
现在,我不确定如何使任务等待 CPU 负载低于 2.0。你们能帮助吗?
英文:
I am trying to create a playbook where I want to perform a simple debug task after cpu load is below 2.0.
I have this so far in cpu-load.yml
:
---
- name: Check CPU load and wait
hosts: localhost
gather_facts: yes
tasks:
- name: Check cpu load
shell: uptime | awk -F 'load average:' '{print $2}' | awk -F ', ' '{print $1}'
register: cpu_load
- name: Wait until cpu load is bellow 2.0
wait_for:
timeout: 300
delay: 10
shell: Do something here
msg: "cpu load is bellow 2.0"
- name: Continue jobs
debug:
msg: "CPU load is bellow 2.0. Continue!!!"
Now I am not sure how to make the task wait for the cpu load to go bellow 2.0. Can you guys help?
答案1
得分: 3
你需要在“检查CPU负载”任务周围放置一个until
循环:
- hosts: localhost
gather_facts: false
tasks:
- name: Check cpu load
shell: uptime | awk -F 'load average:' '{print $2}' | awk -F ', ' '{print $1}'
register: cpu_load
until: cpu_load.stdout|float < 2.0
retries: 300
delay: 1
- name: Some other task
debug:
msg: hello world
这将等待最多五分钟(300次重试,每次延迟1秒),直到负载平均值降到2.0以下。
可能有更好的方法来获取当前1分钟的CPU负载;从/proc/loadavg
读取可能是最简单的方法:
- hosts: localhost
gather_facts: false
tasks:
- name: Check cpu load
command: cat /proc/loadavg
register: cpu_load
until: cpu_load.stdout.split()|first|float < 2.0
retries: 300
delay: 1
- name: Some other task
debug:
msg: hello world
英文:
You need to put an until
loop around your "check cpu load" task:
- hosts: localhost
gather_facts: false
tasks:
- name: Check cpu load
shell: uptime | awk -F 'load average:' '{print $2}' | awk -F ', ' '{print $1}'
register: cpu_load
until: cpu_load.stdout|float < 2.0
retries: 300
delay: 1
- name: Some other task
debug:
msg: hello world
This will wait up to five minutes (300 retries with a 1-second delay) for the load average to drop below 2.0.
There are probably better ways to get the current 1-minute CPU load; reading from /proc/loadavg
may be easiest:
- hosts: localhost
gather_facts: false
tasks:
- name: Check cpu load
command: cat /proc/loadavg
register: cpu_load
until: cpu_load.stdout.split()|first|float < 2.0
retries: 300
delay: 1
- name: Some other task
debug:
msg: hello world
答案2
得分: 2
基于@larsks的回答,并进一步探讨关于以下句子的问题:
可能有更好的方法来获取当前1分钟的CPU负载
实际上,在Linux上至少包含了这些信息的一个事实,例如:
$ ansible localhost -m setup -a gather_subset='!all,!min,loadavg'
localhost | SUCCESS => {
"ansible_facts": {
"ansible_loadavg": {
"15m": 0.669921875,
"1m": 0.48974609375,
"5m": 0.4501953125
},
"gather_subset": [
"!all",
"!min",
"loadavg"
],
"module_setup": true
},
"changed": false
}
注意:
gather_subset='!all,!min,loadavg'
确保仅收集所需的事实(并在下面的剧本中刷新)。有关更多信息,请参阅选项文档。- 我编写这些行时,
loadavg
子集尚未在文档中记录,但在我的Ansible 2.14.6 / 2.15.0版本中的模块错误消息中列出了允许的选项(请参阅下面)。
了解了这一点,并应用与@larsk回答中相同的方法,可以实施如下检查:
---
- name: 演示检查负载并继续进行
hosts: localhost
gather_facts: false
tasks:
- name: 在下一个任务之前检查负载
ansible.builtin.setup:
gather_subset:
- '!all'
- '!min'
- 'loadavg'
retries: 30
delay: 5
until: ansible_loadavg['1m'] < 2.00
- name: 现在在冷却的系统上执行某些操作
debug:
msg: 我在没有压力的情况下做一些事情
关于setup
模块的未记录子集,以下是一个快速而粗糙的解决方案,用于列出您版本的模块中接受的所有子集:
$ ansible localhost -m setup -a gather_subset='toto'
localhost | FAILED! => {
"changed": false,
"msg": "Bad subset 'toto' given to Ansible. gather_subset options allowed: all, all_ipv4_addresses, all_ipv6_addresses, apparmor, architecture, caps, chroot, cmdline, date_time, default_ipv4, default_ipv6, devices, distribution, distribution_major_version, distribution_release, distribution_version, dns, effective_group_ids, effective_user_id, env, facter, fibre_channel_wwn, fips, hardware, interfaces, is_chroot, iscsi, kernel, kernel_version, loadavg, local, lsb, machine, machine_id, mounts, network, nvme, ohai, os_family, pkg_mgr, platform, processor, processor_cores, processor_count, python, python_version, real_user_id, selinux, service_mgr, ssh_host_key_dsa_public, ssh_host_key_ecdsa_public, ssh_host_key_ed25519_public, ssh_host_key_rsa_public, ssh_host_pub_keys, ssh_pub_keys, system, system_capabilities, system_capabilities_enforced, user, user_dir, user_gecos, user_gid, user_id, user_shell, user_uid, virtual, virtualization_role, virtualization_tech_guest, virtualization_tech_host, virtualization_type"
}
英文:
Building up on @larsks answer and going further regarding the sentence
> There are probably better ways to get the current 1-minute CPU load
There's actually a fact containing this information, at least on linux, e.g.
$ ansible localhost -m setup -a gather_subset='!all,!min,loadavg'
localhost | SUCCESS => {
"ansible_facts": {
"ansible_loadavg": {
"15m": 0.669921875,
"1m": 0.48974609375,
"5m": 0.4501953125
},
"gather_subset": [
"!all",
"!min",
"loadavg"
],
"module_setup": true
},
"changed": false
}
Notes:
gather_subset='!all,!min,loadavg'
ensures that stricly only the needed facts are gathered (and refreshed in the below playbook). For more information, see the option documentation- the
loadavg
subset is not documented at time I write these lines but is listed in the allowed options from the module error message in my ansible 2.14.6 / 2.15.0 versions (see below)
Knowing this and applying the same recipe as in @larsk's answer, the check can be implemented as:
---
- name: Demo play to check load and continue
hosts: localhost
gather_facts: false
tasks:
- name: Check load before next tasks
ansible.builtin.setup:
gather_subset:
- '!all'
- '!min'
- 'loadavg'
retries: 30
delay: 5
until: ansible_loadavg['1m'] < 2.00
- name: Now do something on a cooled down system
debug:
msg: I'm doing something without pressure
Regarding the undocumented subset for the setup
module, here is a quick and dirty solution to list all accepted subsets in your version of the module:
$ ansible localhost -m setup -a gather_subset='toto'
localhost | FAILED! => {
"changed": false,
"msg": "Bad subset 'toto' given to Ansible. gather_subset options allowed: all, all_ipv4_addresses, all_ipv6_addresses, apparmor, architecture, caps, chroot, cmdline, date_time, default_ipv4, default_ipv6, devices, distribution, distribution_major_version, distribution_release, distribution_version, dns, effective_group_ids, effective_user_id, env, facter, fibre_channel_wwn, fips, hardware, interfaces, is_chroot, iscsi, kernel, kernel_version, loadavg, local, lsb, machine, machine_id, mounts, network, nvme, ohai, os_family, pkg_mgr, platform, processor, processor_cores, processor_count, python, python_version, real_user_id, selinux, service_mgr, ssh_host_key_dsa_public, ssh_host_key_ecdsa_public, ssh_host_key_ed25519_public, ssh_host_key_rsa_public, ssh_host_pub_keys, ssh_pub_keys, system, system_capabilities, system_capabilities_enforced, user, user_dir, user_gecos, user_gid, user_id, user_shell, user_uid, virtual, virtualization_role, virtualization_tech_guest, virtualization_tech_host, virtualization_type"
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论