英文:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation:
问题
我正在尝试发送请求到一个网站,然后从网站中提取文本。然而,我收到了警告。
> SettingWithCopyWarning:
A value is trying to be set on a copy of a
> slice from a DataFrame
>
> 请查看文档中的注意事项:
> https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
我已经尝试过.copy()
,但问题仍然存在,使用_df.loc
时也出现了"too many indexers"错误。重要的是要注意,我在for循环中传递的DataFrame,所以我每次都调用get_the_text2
方法并传递一行。
def get_the_text2(_df):
'''
用不同的方法第二次发送请求以接收文章的文本
参数
----------
_df : DataFrame
返回
-------
仅包含在URL中的文本
'''
df['text']=''
if str(_df):
website_text=list()
print(_df)
try:
response=requests.get(_df['url'],headers={"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"})
status_code=response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
if len(website_text)<=10:
website_text=list()
if soup.article:
if soup.article.find_all(['p',re.compile("^h\d{1}")]):
for data in soup.article.find_all(['p',re.compile("^h\d{1}")]):
website_text.append(data.get_text(strip=True))
_df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
print('****文章 P & H{1}****',remove_one_words_from_list(website_text,_df['language']))
for _index,item in enumerate(df['status_code']):
if item !=200:
get_the_text2(df.loc[_index])
编辑:
只是显示了.loc
的错误消息。
我的代码:
_df['text']=remove_one_words_from_list(website_text,_df.loc[:,'language']).copy()
错误消息:
IndexingError Traceback (most recent call last)
Cell In[14], line 102
100 for _index,item in enumerate(df['status_code']):
101 if item !=200:
--> 102 get_the_text2(df.loc[_index])
File c:\Users\\anaconda3\envs\GDELT\Lib\site-packages\pandas\core\indexing.py:939, in _LocationIndexer._validate_key_length(self, key)
937 raise IndexingError(_one_ellipsis_message)
938 return self._validate_key_length(key)
--> 939 raise IndexingError("Too many indexers")
940 return key
IndexingError: Too many indexers
编辑2
我发现如果使用.loc['language']
,它不会引发错误,尽管"SettingWithCopyWarning"仍然存在。
_df['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()
根据这篇帖子,我知道为什么会发生这种情况,但不知道如何解决。
英文:
i'm trying to send a request to a website then get the scrape the Text out of the website. however i get warning.
> SettingWithCopyWarning:
A value is trying to be set on a copy of a
> slice from a DataFrame
>
> See the caveats in the documentation:
> https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
i already tried .copy()
and the issue still remains and also with _df.loc
i get too many indexers
error. It's important to note that the dataframe that i pass is in for loop soi call get_the_text2
method in a for loop then pass a row each time
def get_the_text2(_df):
'''
sending a request for second time with a different method to recieve the Text of the Articles
Parameters
----------
_df : DataFrame
Returns
-------
only the text contained in the url
'''
df['text']=''
# for k,i in enumerate(_df['url']):
if str(_df):
website_text=list()
print(_df)
#time.sleep(2)
try:
response=requests.get(_df['url'],headers={"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"})
status_code=response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
if len(website_text)<=10:
website_text=list()
if soup.article:
if soup.article.find_all(['p',re.compile("^h\d{1}")]):
for data in soup.article.find_all(['p',re.compile("^h\d{1}")]):
website_text.append(data.get_text(strip=True))
#df.at[k,'text']=remove_one_words_from_list(website_text,df.at[k,'language'])
_df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
print('****ARTICLE P & H{1}****',remove_one_words_from_list(website_text,_df['language']))
for _index,item in enumerate(df['status_code']):
if item !=200:
get_the_text2(df.loc[_index])
EDIT:
just to show the error message with .loc
my Code:
_df['text']=remove_one_words_from_list(website_text,_df.loc[:,'language']).copy()
error message:
IndexingError Traceback (most recent call last)
Cell In[14], line 102
100 for _index,item in enumerate(df['status_code']):
101 if item !=200:
--> 102 get_the_text2(df.loc[_index])
File c:\Users\\anaconda3\envs\GDELT\Lib\site-packages\pandas\core\indexing.py:939, in _LocationIndexer._validate_key_length(self, key)
937 raise IndexingError(_one_ellipsis_message)
938 return self._validate_key_length(key)
--> 939 raise IndexingError("Too many indexers")
940 return key
IndexingError: Too many indexers
EDIT 2
found out if i use this .loc['language']
it won't throw error although the SettingWithCopyWarning
is still there.
_df['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()
according to this post i know why it's happened but don't know how to fix it.
答案1
得分: 0
I tried to assign the new value to a new Dataframe and this did the job.
_df2=pd.DataFrame(columns=list(df.columns)) # to get the columns from the original Dataframe
_df2['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()
英文:
i tried to assign the new value to a new Dataframe and this did the job.
_df2=pd.DataFrame(columns=list(df.columns)) # to get the columns from the original Dataframe
_df2['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论