SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation:

huangapple go评论79阅读模式
英文:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation:

问题

我正在尝试发送请求到一个网站,然后从网站中提取文本。然而,我收到了警告。

> SettingWithCopyWarning:
A value is trying to be set on a copy of a
> slice from a DataFrame
>
> 请查看文档中的注意事项:
> https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()

我已经尝试过.copy(),但问题仍然存在,使用_df.loc时也出现了"too many indexers"错误。重要的是要注意,我在for循环中传递的DataFrame,所以我每次都调用get_the_text2方法并传递一行。

def get_the_text2(_df):
    '''
    用不同的方法第二次发送请求以接收文章的文本
    
    参数
    ----------
    _df : DataFrame
    
    返回
    -------
    仅包含在URL中的文本
    '''
    df['text']=''
    if str(_df):
        website_text=list()
        print(_df)   
        try:
            response=requests.get(_df['url'],headers={"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"})              
            status_code=response.status_code
            soup = BeautifulSoup(response.content, 'html.parser')
            
            if len(website_text)<=10:
                website_text=list()
                if soup.article:
                    if soup.article.find_all(['p',re.compile("^h\d{1}")]):   
                        for data in soup.article.find_all(['p',re.compile("^h\d{1}")]):                          
  
                            website_text.append(data.get_text(strip=True))            
                        _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
                        print('****文章 P & H{1}****',remove_one_words_from_list(website_text,_df['language']))
  
for _index,item in enumerate(df['status_code']):
    if item !=200:
        get_the_text2(df.loc[_index])

编辑

只是显示了.loc的错误消息。

我的代码:

_df['text']=remove_one_words_from_list(website_text,_df.loc[:,'language']).copy()

错误消息:

IndexingError                             Traceback (most recent call last)
Cell In[14], line 102
    100 for _index,item in enumerate(df['status_code']):
    101   if item !=200:
--> 102     get_the_text2(df.loc[_index])
File c:\Users\\anaconda3\envs\GDELT\Lib\site-packages\pandas\core\indexing.py:939, in _LocationIndexer._validate_key_length(self, key)
    937             raise IndexingError(_one_ellipsis_message)
    938         return self._validate_key_length(key)
--> 939     raise IndexingError("Too many indexers")
    940 return key
IndexingError: Too many indexers

编辑2

我发现如果使用.loc['language'],它不会引发错误,尽管"SettingWithCopyWarning"仍然存在。

_df['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()

根据这篇帖子,我知道为什么会发生这种情况,但不知道如何解决。

英文:

i'm trying to send a request to a website then get the scrape the Text out of the website. however i get warning.

> SettingWithCopyWarning:
A value is trying to be set on a copy of a
> slice from a DataFrame
>
> See the caveats in the documentation:
> https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()

i already tried .copy() and the issue still remains and also with _df.loc i get too many indexers error. It's important to note that the dataframe that i pass is in for loop soi call get_the_text2 method in a for loop then pass a row each time

    def get_the_text2(_df):
  &#39;&#39;&#39;
  sending a request for second time with a different method to recieve the Text of the Articles

  Parameters
  ----------
  _df : DataFrame
  
  Returns
  -------
  only the text contained in the url
  &#39;&#39;&#39;  
  df[&#39;text&#39;]=&#39;&#39;
#   for k,i in enumerate(_df[&#39;url&#39;]):
  if str(_df):
        website_text=list()
        print(_df)   
        #time.sleep(2)
        try:
          response=requests.get(_df[&#39;url&#39;],headers={&quot;User-Agent&quot; : &quot;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36&quot;})              
          status_code=response.status_code
          soup = BeautifulSoup(response.content, &#39;html.parser&#39;)
          
          if len(website_text)&lt;=10:
                      website_text=list()
                      if soup.article:
                          if soup.article.find_all([&#39;p&#39;,re.compile(&quot;^h\d{1}&quot;)]):   
                              for data in soup.article.find_all([&#39;p&#39;,re.compile(&quot;^h\d{1}&quot;)]):                          
                                  website_text.append(data.get_text(strip=True))            
                              #df.at[k,&#39;text&#39;]=remove_one_words_from_list(website_text,df.at[k,&#39;language&#39;])
                              _df[&#39;text&#39;]=remove_one_words_from_list(website_text,_df[&#39;language&#39;]).copy()
                              print(&#39;****ARTICLE P &amp; H{1}****&#39;,remove_one_words_from_list(website_text,_df[&#39;language&#39;]))

for _index,item in enumerate(df[&#39;status_code&#39;]):
  if item !=200:
    get_the_text2(df.loc[_index])

EDIT:

just to show the error message with .loc

my Code:

_df[&#39;text&#39;]=remove_one_words_from_list(website_text,_df.loc[:,&#39;language&#39;]).copy()

error message:

IndexingError                             Traceback (most recent call last)
Cell In[14], line 102
    100 for _index,item in enumerate(df[&#39;status_code&#39;]):
    101   if item !=200:
--&gt; 102     get_the_text2(df.loc[_index])

File c:\Users\\anaconda3\envs\GDELT\Lib\site-packages\pandas\core\indexing.py:939, in _LocationIndexer._validate_key_length(self, key)
    937             raise IndexingError(_one_ellipsis_message)
    938         return self._validate_key_length(key)
--&gt; 939     raise IndexingError(&quot;Too many indexers&quot;)
    940 return key

IndexingError: Too many indexers

EDIT 2

found out if i use this .loc[&#39;language&#39;] it won't throw error although the SettingWithCopyWarning is still there.

_df[&#39;text&#39;]=remove_one_words_from_list(website_text,_df.loc[&#39;language&#39;]).copy()

according to this post i know why it's happened but don't know how to fix it.

答案1

得分: 0

I tried to assign the new value to a new Dataframe and this did the job.

_df2=pd.DataFrame(columns=list(df.columns)) # to get the columns from the original Dataframe
_df2['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()

英文:

i tried to assign the new value to a new Dataframe and this did the job.

_df2=pd.DataFrame(columns=list(df.columns)) # to get the columns from the original Dataframe
_df2[&#39;text&#39;]=remove_one_words_from_list(website_text,_df.loc[&#39;language&#39;]).copy()

huangapple
  • 本文由 发表于 2023年7月6日 18:55:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76628083.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定