英文:
How to throw exception when a for loop is taking more time than usual to complete in windows
问题
我有一个网站列表(超过250个),我想获取所有网站中的文本,以便进一步分析。问题出现在一些网站上,它们需要很长时间才能加载,或者甚至在发送请求的过程中卡住了。
以下是代码:
def get_the_text(_df):
'''
发送请求以获取文章的文本
参数
----------
_df : DataFrame
返回
-------
包含文章文本的数据框
'''
df['text'] = ''
for k, link in enumerate(df['url']):
if link:
website_text = list()
print(link, '\n', 'K:', k)
session = requests.Session()
retry = Retry(connect=2, backoff_factor=0.3)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
try:
timeout_decorator.timeout(seconds=10) # 超时时间为10秒
time.sleep(1)
response = session.get(link)
except TimeoutError:
print('花费的时间太长')
continue
except ConnectionError:
print('连接错误')
正如你所看到的,我尝试了此帖子中提到的两种解决方案。我发现在Windows上不支持Signal库中的SIGALRM。第二种解决方案是使用timeout_decorator
,它在超过10秒时不会引发异常。
我想在处理时间超过10秒时跳过一个请求。我该如何实现这个目标?
英文:
i have a list of websites (more than 250) and i would like to get all the texts in the website, for further analysis. the problem occurs for some websites, which takes long time to load or it even get's stuck in the process of sending a Request.
here's the code:
def get_the_text(_df):
'''
sending a request to recieve the Text of the Articles
Parameters
----------
_df : DataFrame
Returns
-------
dataframe with the text of the articles
'''
df['text']=''
for k,link in enumerate(df['url']):
if link:
website_text=list()
print(link,'\n','K:',k)
#time.sleep(2)
session = requests.Session()
retry = Retry(connect=2, backoff_factor=0.3)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# signal.signal(signal.SIGALRM, took_too_long)
# signal.setitimer(signal.ITIMER_REAL, 10)# 10 seconds
try:
timeout_decorator.timeout(seconds=10)#timeout of 10 seconds
time.sleep(1)
response=session.get(link)
# signal.setitimer(signal.ITIMER_REAL, 0) # success, reset to 0 to disable the timer
#GETS THE TEXT IN THE WEBSITE THEN
except TimeoutError:
print('Took too long')
continue
except ConnectionError:
print('Connection error')
as you can see i tried both solutions mentioned in this post. i found out that using Signal library the SIGALRM is not supported on Windows. the second solution,which is timeout_decorator
doesn't throw exception, when it takes more than for example 10 seconds.
i would like to skip a request when it get's more than 10 second to process. how can i achieve this?
答案1
得分: 0
func-timeout 是一个在给定秒数后引发异常的库。这个库不仅在Windows上运行,也可以在其他操作系统上运行。
这是一个函数,您可以在其中传递超时时间、要调用的函数以及任何参数,它会运行该函数最多 #timeout# 秒,并返回/引发传递的函数本来会返回或引发的任何内容。
应该像这样使用:
import func_timeout
for k, link in enumerate(df['url']):
if link:
try:
response = func_timeout.func_timeout(timeout=10, func=send_request, args=)
except func_timeout.FunctionTimedOut:
print('请求响应时间太长')
continue
英文:
found
func-timeout
library that raises Exception after given seconds. This library works not only in Windows, but also on other Operating Systems.
> This is the function wherein you pass the timeout, the function you want to call, and any arguments, and it runs it for up to #timeout# seconds, and will return/raise anything the passed function would otherwise return or raise.
should be used like this.
import func_timeout
for k,link in enumerate(df['url']):
if link:
try:
response = func_timeout.func_timeout(timeout=10, func=send_request, args=)
except func_timeout.FunctionTimedOut:
print('Took too long to respond to the request')
continue
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论