英文:
Retry task on a Windows node if unreachable
问题
有办法在Windows节点临时不可达时重试任务吗?
例如,我尝试过
- name: Hello
  ansible.windows.win_powershell:
    script: | 
      Write-Host "hello"
  register: _status
  until: _status is not unreachable
  retries: 3
  delay: 200
但是,30秒后,我得到了
fatal: [mylocalwin]: UNREACHABLE! => changed=false 
  msg: 'certificate: HTTPSConnectionPool(host=''xxx.xxx.xxx.xxx'', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f4160b63eb0>, ''Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=30)''))'
  unreachable: true
我想在失败之前重试三次。
英文:
Is there a way to retry a task if the Windows node is temporarily unreachable?
For example, I tried
- name: Hello
  ansible.windows.win_powershell:
    script: | 
      Write-Host "hello"
  register: _status
  until: _status is not unreachable
  retries: 3
  delay: 200
But, after 30 seconds, I got
fatal: [mylocalwin]: UNREACHABLE! => changed=false 
  msg: 'certificate: HTTPSConnectionPool(host=''xxx.xxx.xxx.xxx'', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f4160b63eb0>, ''Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=30)''))'
  unreachable: true
I would like to retry three times before failing.
答案1
得分: 1
以下是您要翻译的内容:
Here there is my solution based on https://github.com/ansible/ansible/issues/25532#issuecomment-428386816
Modify
/lib/python3.10/site-packages/winrm/protocol.py
class Protocol(object):
def init(
...
reconnection_retries=0,
reconnection_backoff_factor=2.0
):
...
self.transport = Transport(
...   
reconnection_retries=reconnection_retries,
reconnection_backoff_factor=reconnection_backoff_factor
)
/lib/python3.10/site-packages/winrm/transport.py
class Transport(object):
def init(
...
reconnection_retries=0,
reconnection_backoff_factor=2.0):
...
self.reconnection_retries = reconnection_retries
self.reconnection_backoff_factor = reconnection_backoff_factor
...
def build_session(self):
...
Merge proxy environment variables
settings = session.merge_environment_settings(url=self.endpoint,
proxies=proxies, stream=None, verify=None, cert=None)
ADD
Retry on connection errors, with a backoff factor
retries = requests.packages.urllib3.util.retry.Retry(total=self.reconnection_retries,
connect=self.reconnection_retries,
status=self.reconnection_retries,
read=0,
backoff_factor=self.reconnection_backoff_factor,
status_forcelist=(413, 425, 429, 503))
ADD
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=retries))
session.mount('https://', requests.adapters.HTTPAdapter(max_retries=retries))
...
Now it is possible to control the retry when the node is unreachable
- name: Test
hosts: mylocalwin
gather_facts: false
vars:
ansible_winrm_reconnection_backoff_factor: 2.0
ansible_winrm_reconnection_retries: 4 
tasks:
- name: Hello
ansible.windows.win_powershell:
script: |
Write-Host "hello"; 
I checked the solution with tcpdump and I can confirm then the TCP SYN groups are re-sent for reconnection_retries times.
Here there is a small recap about performaces
TYPE                ERROR DETECTION (sec)    NUM OF TCP SYN SENT
RETRY_0_BACKOFF_2    30                        5
RETRY_1_BACKOFF_2    60                        10
RETRY_2_BACKOFF_2    94                        15
RETRY_3_BACKOFF_2    133                        20
RETRY_4_BACKOFF_2    179                        25
RETRY_5_BACKOFF_2    240                        30
NO_RETRY_MECHANISM    30                        5
英文:
Here there is my solution based on https://github.com/ansible/ansible/issues/25532#issuecomment-428386816
Modify
/lib/python3.10/site-packages/winrm/protocol.py
class Protocol(object):
    def __init__(
            ...
            reconnection_retries=0,
            reconnection_backoff_factor=2.0
        ):
        ...
        
        self.transport = Transport(
            ...      
            reconnection_retries=reconnection_retries,
            reconnection_backoff_factor=reconnection_backoff_factor
        )
/lib/python3.10/site-packages/winrm/transport.py
class Transport(object):
    def __init__(
        ...
        reconnection_retries=0,
        reconnection_backoff_factor=2.0):
        
        ...
        self.reconnection_retries = reconnection_retries
        self.reconnection_backoff_factor = reconnection_backoff_factor
        ...
        
    def build_session(self):
        ...
        
        # Merge proxy environment variables
        settings = session.merge_environment_settings(url=self.endpoint,
                      proxies=proxies, stream=None, verify=None, cert=None)
        # ADD
        # Retry on connection errors, with a backoff factor
        retries = requests.packages.urllib3.util.retry.Retry(total=self.reconnection_retries,
                                                             connect=self.reconnection_retries,
                                                             status=self.reconnection_retries,
                                                             read=0,
                                                             backoff_factor=self.reconnection_backoff_factor,
                                                             status_forcelist=(413, 425, 429, 503))
        # ADD
        session.mount('http://', requests.adapters.HTTPAdapter(max_retries=retries))
        session.mount('https://', requests.adapters.HTTPAdapter(max_retries=retries))  
        ...      
Now it is possible to control the retry when the node is unreachable
- name: Test
  hosts: mylocalwin
  gather_facts: false
  vars:
    ansible_winrm_reconnection_backoff_factor: 2.0
    ansible_winrm_reconnection_retries: 4
  tasks:
    - name: Hello
      ansible.windows.win_powershell:
        script: | 
          Write-Host "hello"
I checked the solution with tcpdump and I can confirm then the TCP SYN groups are re-sent for reconnection_retries times.
Here there is a small recap about performaces
TYPE	            ERROR DETECTION (sec)	NUM OF TCP SYN SENT
RETRY_0_BACKOFF_2	30	                    5
RETRY_1_BACKOFF_2	60	                    10
RETRY_2_BACKOFF_2	94	                    15
RETRY_3_BACKOFF_2	133	                    20
RETRY_4_BACKOFF_2	179	                    25
RETRY_5_BACKOFF_2	240	                    30
NO_RETRY_MECHANISM	30	                    5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论