问题

这个问题是当脚本遇到HTTP错误代码时，会退出并停止处理while循环。有没有办法在遇到HTTP错误后继续执行脚本呢？

问题的答案是，你可以在遇到HTTP错误后添加一些异常处理来让脚本继续执行。在你的脚本中，当遇到HTTP错误时，它会引发urllib.error.HTTPError异常。你可以在异常处理块中添加代码，以便在遇到HTTP错误时不退出脚本，而是继续处理下一个URL。这里是修改后的代码：

import time
import urllib.request
import urllib.error

url_list = []
url_data = []
total_duration = 0.5
wait_time = 5
http200_count = 0
http300_count = 0
http400_count = 0
http500_count = 0
counter = 1

with open("url_list.csv", "r") as url_file:
    headers = next(url_file).strip().replace(" ", "").split(",")
    
    for row in url_file:
        url_list = row.strip().replace(" ", "").split(",")
        url_dict = dict(zip(headers, url_list), http200=http200_count, http300=http300_count, http400=http400_count, http500=http500_count)
        url_data.append(url_dict)

url_file.close()

interval = time.time() + 60 * total_duration
print(interval)

while 1:

    while 1:
        print(f"Counter: {counter}")

        for url_entry in url_data:
            print(f"Website:", url_entry["name"])
            print(f"URL:", url_entry["url"])
            
            try:
                with urllib.request.urlopen(url_entry["url"]) as response:
                    print(response.status)
                    request_status = response.status
                    print("All is good!")
            except urllib.error.HTTPError as error:
                print(error.status, error.reason)
                request_status = error.code  # Use the error code
            except urllib.error.URLError as error:
                print(error.reason)
                request_status = 0  # Use 0 to indicate URL error
            else:
                print("All is good!")

            if request_status >= 200 and request_status <= 299:
                url_entry["http200"] = url_entry["http200"] + 1
            elif request_status >= 300 and request_status <= 399:
                url_entry["http300"] = url_entry["http300"] + 1
            elif request_status >= 400 and request_status <= 499:
                url_entry["http400"] = url_entry["http400"] + 1
            elif request_status >= 500 and request_status <= 599:
                url_entry["http500"] = url_entry["http500"] + 1
            print("", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
            print("")

        time.sleep(wait_time)
        counter = counter + 1

        print(time.time(), "-", interval)
        if time.time() >= interval:
            print(f"There are {counter-1} probes in the past", 60 * total_duration, "secs.")
            print(f"HTTP Status code:", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
            interval = time.time() + 60 * total_duration
            counter = 1

通过这些异常处理，即使遇到HTTP错误，脚本也会继续执行下一个URL的检查。

英文:

I have a python script that accepts a CSV file containing a list of URLs. The CSV file looks like this.

name, url
google, https://httpstat.us/200
yahoo, https://httpstat.us/401
bcs, https://httpstat.us/521

The python script starts from here.

import time
import urllib.request
import urllib.error
url_list = []
url_data = []
total_duration = 0.5
wait_time = 5
http200_count = 0
http300_count = 0
http400_count = 0
http500_count = 0
counter = 1
with open(&quot;url_list.csv&quot;, &quot;r&quot;) as url_file:
headers = next(url_file).strip().replace(&quot; &quot;,&quot;&quot;).split(&quot;,&quot;)
for row in url_file:
url_list = row.strip().replace(&quot; &quot;,&quot;&quot;).split(&quot;,&quot;)
url_dict = dict(zip(headers, url_list), http200 = http200_count, http300 = http300_count, http400 = http400_count, http500 = http500_count)
url_data.append(url_dict)
url_file.close()
interval = time.time() + 60 * total_duration
print(interval)
while 1:
while 1:
print(f&quot;Counter: {counter}&quot;)
for url_entry in url_data:
print(f&quot;Website:&quot;, url_entry[&quot;name&quot;])
print(f&quot;URL:&quot;, url_entry[&quot;url&quot;])
print(urllib.request.urlopen(url_entry[&quot;url&quot;]).status)
try:
with urllib.request.urlopen(url_entry[&quot;url&quot;]) as response:
print(response.status)
request_status = response.status
# return response.read(), response
except urllib.error.HTTPError as error:
print(error.status, error.reason)
continue
except urllib.error.URLError as error:
print(error.reason)
continue
else:
print(&quot;All is good!&quot;)
if request_status &gt;= 200 and request_status &lt;=299:
url_entry[&quot;http200&quot;] = url_entry[&quot;http200&quot;] + 1
elif request_status &gt;= 300 and request_status &lt;=399:
url_entry[&quot;http300&quot;] = url_entry[&quot;http300&quot;] + 1
elif request_status &gt;= 400 and request_status &lt;=499:
url_entry[&quot;http400&quot;] = url_entry[&quot;http400&quot;] + 1
elif request_status &gt;= 500 and request_status &lt;=599:
url_entry[&quot;http500&quot;] = url_entry[&quot;http500&quot;] + 1
print(f&quot;&quot;, url_entry[&quot;name&quot;], &quot;-&quot;, url_entry[&quot;http200&quot;], &quot;-&quot;, url_entry[&quot;http300&quot;], &quot;-&quot;, url_entry[&quot;http400&quot;], &quot;-&quot;, url_entry[&quot;http500&quot;])
print(&quot;&quot;)
time.sleep(wait_time)
counter = counter + 1
print(time.time(),&quot;-&quot;,interval)
if time.time() &gt;= interval:
print(f&quot;There are {counter-1} probes in the past&quot;, 60 * total_duration, &quot;secs.&quot;)
print(f&quot;HTTP Status code:&quot;, url_entry[&quot;name&quot;], &quot;-&quot;, url_entry[&quot;http200&quot;], &quot;-&quot;, url_entry[&quot;http300&quot;], &quot;-&quot;, url_entry[&quot;http400&quot;], &quot;-&quot;, url_entry[&quot;http500&quot;])
interval = time.time() + 60 * total_duration
counter = 1

The problem is whenever it hits a URL with HTTP error codes it will exit and the while loop stops processing. Is there any way to continue the script even after hitting HTTP errors?

➜  DevOps_Practice python urlProbe.py
1687386788.8879528
Counter: 1
Website: google
URL: https://httpstat.us/200
200
200
All is good!
google - 1 - 0 - 0 - 0
Website: yahoo
URL: https://httpstat.us/401
Traceback (most recent call last):
File &quot;/Users/desmondlim/Documents/DevOps/Projects/DevOps_Challenge_SPH/urlProbe.py&quot;, line 38, in &lt;module&gt;
print(urllib.request.urlopen(url_entry[&quot;url&quot;]).status)
File &quot;/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py&quot;, line 216, in urlopen
return opener.open(url, data, timeout)
File &quot;/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py&quot;, line 525, in open
response = meth(req, response)
File &quot;/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py&quot;, line 634, in http_response
response = self.parent.error(
File &quot;/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py&quot;, line 563, in error
return self._call_chain(*args)
File &quot;/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py&quot;, line 496, in _call_chain
result = func(*args)
File &quot;/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py&quot;, line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized

Thanks.

答案1

得分: 0

问题出在这一行：

print(urllib.request.urlopen(url_entry["url"]).status)

你试图在进入try-except块之前打开URL并打印其状态，如果URL导致HTTP错误，这将失败，而且异常不会被捕获。这就是为什么你的脚本停止运行。

要解决这个问题，你应该移除这行代码。你的try-except块已经尝试打开URL并正确打印其状态了。

英文:

The problem is with this line:

print(urllib.request.urlopen(url_entry[&quot;url&quot;]).status)

You're attempting to open the URL and print its status before entering the try-except block, and this will fail without being caught by the exceptions if the URL leads to an HTTP error. This is why your script stops.

To fix this, you should remove this line. Your try-except block already attempts to open the URL and print its status correctly.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

继续执行Python脚本，即使urllib.request遇到HTTP错误代码。

问题

答案1

EasyOCR – 使用Python批处理图像

将Python脚本转换为在GPU（CUDA）上运行。

无法使for循环在使用Beautiful Soup 4解析HTML时工作

“当使用ListView时，模板中未显示上下文”

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论