英文:
Continue python script even if urllib.request encounters HTTP error codes
问题
这个问题是当脚本遇到HTTP错误代码时,会退出并停止处理while循环。有没有办法在遇到HTTP错误后继续执行脚本呢?
问题的答案是,你可以在遇到HTTP错误后添加一些异常处理来让脚本继续执行。在你的脚本中,当遇到HTTP错误时,它会引发urllib.error.HTTPError异常。你可以在异常处理块中添加代码,以便在遇到HTTP错误时不退出脚本,而是继续处理下一个URL。这里是修改后的代码:
import time
import urllib.request
import urllib.error
url_list = []
url_data = []
total_duration = 0.5
wait_time = 5
http200_count = 0
http300_count = 0
http400_count = 0
http500_count = 0
counter = 1
with open("url_list.csv", "r") as url_file:
headers = next(url_file).strip().replace(" ", "").split(",")
for row in url_file:
url_list = row.strip().replace(" ", "").split(",")
url_dict = dict(zip(headers, url_list), http200=http200_count, http300=http300_count, http400=http400_count, http500=http500_count)
url_data.append(url_dict)
url_file.close()
interval = time.time() + 60 * total_duration
print(interval)
while 1:
while 1:
print(f"Counter: {counter}")
for url_entry in url_data:
print(f"Website:", url_entry["name"])
print(f"URL:", url_entry["url"])
try:
with urllib.request.urlopen(url_entry["url"]) as response:
print(response.status)
request_status = response.status
print("All is good!")
except urllib.error.HTTPError as error:
print(error.status, error.reason)
request_status = error.code # Use the error code
except urllib.error.URLError as error:
print(error.reason)
request_status = 0 # Use 0 to indicate URL error
else:
print("All is good!")
if request_status >= 200 and request_status <= 299:
url_entry["http200"] = url_entry["http200"] + 1
elif request_status >= 300 and request_status <= 399:
url_entry["http300"] = url_entry["http300"] + 1
elif request_status >= 400 and request_status <= 499:
url_entry["http400"] = url_entry["http400"] + 1
elif request_status >= 500 and request_status <= 599:
url_entry["http500"] = url_entry["http500"] + 1
print("", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
print("")
time.sleep(wait_time)
counter = counter + 1
print(time.time(), "-", interval)
if time.time() >= interval:
print(f"There are {counter-1} probes in the past", 60 * total_duration, "secs.")
print(f"HTTP Status code:", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
interval = time.time() + 60 * total_duration
counter = 1
通过这些异常处理,即使遇到HTTP错误,脚本也会继续执行下一个URL的检查。
英文:
I have a python script that accepts a CSV file containing a list of URLs. The CSV file looks like this.
name, url
google, https://httpstat.us/200
yahoo, https://httpstat.us/401
bcs, https://httpstat.us/521
The python script starts from here.
import time
import urllib.request
import urllib.error
url_list = []
url_data = []
total_duration = 0.5
wait_time = 5
http200_count = 0
http300_count = 0
http400_count = 0
http500_count = 0
counter = 1
with open("url_list.csv", "r") as url_file:
headers = next(url_file).strip().replace(" ","").split(",")
for row in url_file:
url_list = row.strip().replace(" ","").split(",")
url_dict = dict(zip(headers, url_list), http200 = http200_count, http300 = http300_count, http400 = http400_count, http500 = http500_count)
url_data.append(url_dict)
url_file.close()
interval = time.time() + 60 * total_duration
print(interval)
while 1:
while 1:
print(f"Counter: {counter}")
for url_entry in url_data:
print(f"Website:", url_entry["name"])
print(f"URL:", url_entry["url"])
print(urllib.request.urlopen(url_entry["url"]).status)
try:
with urllib.request.urlopen(url_entry["url"]) as response:
print(response.status)
request_status = response.status
# return response.read(), response
except urllib.error.HTTPError as error:
print(error.status, error.reason)
continue
except urllib.error.URLError as error:
print(error.reason)
continue
else:
print("All is good!")
if request_status >= 200 and request_status <=299:
url_entry["http200"] = url_entry["http200"] + 1
elif request_status >= 300 and request_status <=399:
url_entry["http300"] = url_entry["http300"] + 1
elif request_status >= 400 and request_status <=499:
url_entry["http400"] = url_entry["http400"] + 1
elif request_status >= 500 and request_status <=599:
url_entry["http500"] = url_entry["http500"] + 1
print(f"", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
print("")
time.sleep(wait_time)
counter = counter + 1
print(time.time(),"-",interval)
if time.time() >= interval:
print(f"There are {counter-1} probes in the past", 60 * total_duration, "secs.")
print(f"HTTP Status code:", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
interval = time.time() + 60 * total_duration
counter = 1
The problem is whenever it hits a URL with HTTP error codes it will exit and the while loop stops processing. Is there any way to continue the script even after hitting HTTP errors?
➜ DevOps_Practice python urlProbe.py
1687386788.8879528
Counter: 1
Website: google
URL: https://httpstat.us/200
200
200
All is good!
google - 1 - 0 - 0 - 0
Website: yahoo
URL: https://httpstat.us/401
Traceback (most recent call last):
File "/Users/desmondlim/Documents/DevOps/Projects/DevOps_Challenge_SPH/urlProbe.py", line 38, in <module>
print(urllib.request.urlopen(url_entry["url"]).status)
File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 525, in open
response = meth(req, response)
File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 563, in error
return self._call_chain(*args)
File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
Thanks.
答案1
得分: 0
问题出在这一行:
print(urllib.request.urlopen(url_entry["url"]).status)
你试图在进入try-except块之前打开URL并打印其状态,如果URL导致HTTP错误,这将失败,而且异常不会被捕获。这就是为什么你的脚本停止运行。
要解决这个问题,你应该移除这行代码。你的try-except块已经尝试打开URL并正确打印其状态了。
英文:
The problem is with this line:
print(urllib.request.urlopen(url_entry["url"]).status)
You're attempting to open the URL and print its status before entering the try-except block, and this will fail without being caught by the exceptions if the URL leads to an HTTP error. This is why your script stops.
To fix this, you should remove this line. Your try-except block already attempts to open the URL and print its status correctly.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论