在Windows中,当for循环花费的时间超过通常时间时,如何抛出异常?

huangapple go评论79阅读模式
英文:

How to throw exception when a for loop is taking more time than usual to complete in windows

问题

我有一个网站列表(超过250个),我想获取所有网站中的文本,以便进一步分析。问题出现在一些网站上,它们需要很长时间才能加载,或者甚至在发送请求的过程中卡住了。

以下是代码:

def get_the_text(_df):
    '''
    发送请求以获取文章的文本
    参数
    ----------
    _df : DataFrame

    返回
    -------
    包含文章文本的数据框
    '''  
    df['text'] = ''
    for k, link in enumerate(df['url']):
        if link:
            website_text = list()
            print(link, '\n', 'K:', k)         

            session = requests.Session()
            retry = Retry(connect=2, backoff_factor=0.3)
            adapter = HTTPAdapter(max_retries=retry)
            session.mount('http://', adapter)
            session.mount('https://', adapter)

            try:
                timeout_decorator.timeout(seconds=10)  # 超时时间为10秒
                time.sleep(1)
                response = session.get(link)
            except TimeoutError:
                print('花费的时间太长')
                continue
            except ConnectionError:
                print('连接错误')

正如你所看到的,我尝试了此帖子中提到的两种解决方案。我发现在Windows上不支持Signal库中的SIGALRM。第二种解决方案是使用timeout_decorator,它在超过10秒时不会引发异常。

我想在处理时间超过10秒时跳过一个请求。我该如何实现这个目标?

英文:

i have a list of websites (more than 250) and i would like to get all the texts in the website, for further analysis. the problem occurs for some websites, which takes long time to load or it even get's stuck in the process of sending a Request.

here's the code:

def get_the_text(_df):
  '''
  sending a request to recieve the Text of the Articles    
  Parameters
  ----------
  _df : DataFrame
  
  Returns
  -------
  dataframe with the text of the articles
  '''  
  df['text']=''
  for k,link in enumerate(df['url']):
        if link:
            website_text=list()
            print(link,'\n','K:',k)         
            #time.sleep(2)
            
            
            session = requests.Session()
            retry = Retry(connect=2, backoff_factor=0.3)
            adapter = HTTPAdapter(max_retries=retry)
            session.mount('http://', adapter)
            session.mount('https://', adapter)
            
            # signal.signal(signal.SIGALRM, took_too_long)
            # signal.setitimer(signal.ITIMER_REAL, 10)# 10 seconds
            try:
                
                timeout_decorator.timeout(seconds=10)#timeout of 10 seconds 
                time.sleep(1)
                response=session.get(link)
                # signal.setitimer(signal.ITIMER_REAL, 0)    # success, reset to 0 to disable the timer

             #GETS THE TEXT IN THE WEBSITE THEN


            except TimeoutError:
                print('Took too long')
                continue
            except ConnectionError:
                print('Connection error')

as you can see i tried both solutions mentioned in this post. i found out that using Signal library the SIGALRM is not supported on Windows. the second solution,which is timeout_decorator doesn't throw exception, when it takes more than for example 10 seconds.

i would like to skip a request when it get's more than 10 second to process. how can i achieve this?

答案1

得分: 0

func-timeout 是一个在给定秒数后引发异常的库。这个库不仅在Windows上运行,也可以在其他操作系统上运行。

这是一个函数,您可以在其中传递超时时间、要调用的函数以及任何参数,它会运行该函数最多 #timeout# 秒,并返回/引发传递的函数本来会返回或引发的任何内容。

应该像这样使用:

import func_timeout
for k, link in enumerate(df['url']):
    if link:
        try:
            response = func_timeout.func_timeout(timeout=10, func=send_request, args=
)
except func_timeout.FunctionTimedOut: print('请求响应时间太长') continue
英文:

found
func-timeout
library that raises Exception after given seconds. This library works not only in Windows, but also on other Operating Systems.

> This is the function wherein you pass the timeout, the function you want to call, and any arguments, and it runs it for up to #timeout# seconds, and will return/raise anything the passed function would otherwise return or raise.

should be used like this.

import func_timeout
for k,link in enumerate(df['url']):
    if link:      
        try:                
            response = func_timeout.func_timeout(timeout=10, func=send_request, args=
) except func_timeout.FunctionTimedOut: print('Took too long to respond to the request') continue

huangapple
  • 本文由 发表于 2023年7月3日 17:50:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603626.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定