使用BeautifulSoup从文本中删除标签。

huangapple go评论72阅读模式
英文:

Removing tags from text with BeautifulSoup

问题

Sure, here's the translated code portion:

我有这段代码来从NightBot的频道页面提取歌曲名称

```python
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox(executable_path=r'C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe')
driver.get('https://nightbot.tv/t/tonyxzero/song_requests')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')
list_item = soup.select("h4 > strong.ng-binding")
print(list_item)
name = list_item.text.strip()
print(name)

但是当我运行它时,显示了类似以下内容:

[<strong class="ng-binding">Jamiroquai - Virtual Insanity (Official Video)<!-- ngIf: currentSong.track.artist --><span class="ng-binding ng-scope" ng-if="currentSong.track.artist" style=""></span><!-- end ngIf: currentSong.track.artist --></strong>]

然后出现了以下错误:

AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

是否有另一种方法只显示文本而不包括标签?


<details>
<summary>英文:</summary>

I&#39;ve this code to extract a song name from NightBot&#39;s channel page:

import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox(executable_path=r'C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe')
driver.get ('https://nightbot.tv/t/tonyxzero/song_requests')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')
list_item=soup.select("h4 > strong.ng-binding")
print (list_item)
name = list_item.text.strip()
print (name)


But when i run it, shows me something like this:

[<strong class="ng-binding">Jamiroquai - Virtual Insanity (Official Video)<!-- ngIf: currentSong.track.artist --><span class="ng-binding ng-scope" ng-if="currentSong.track.artist" style=""> — JamiroquaiVEVO</span><!-- end ngIf: currentSong.track.artist --></strong>]


And them this:

```AttributeError: ResultSet object has no attribute &#39;text&#39;. You&#39;re probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?```


Theres another way to just show the text without the tags?

</details>


# 答案1
**得分**: 1

```python
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox(executable_path=r'C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe')
driver.get('https://nightbot.tv/t/tonyxzero/song_requests')

html = driver.page_source

soup = BeautifulSoup(html, 'lxml')
name = soup.find('strong', {'class': 'ng-binding'}).text
#print (list_item)
#name = list_item.text.strip()
print(name)
英文:
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox(executable_path=r&#39;C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe&#39;)
driver.get (&#39;https://nightbot.tv/t/tonyxzero/song_requests&#39;)

html = driver.page_source

soup = BeautifulSoup(html, &#39;lxml&#39;)
name=soup.find(&#39;strong&#39;,{&#39;class&#39;:&#39;ng-binding&#39;}).text
#print (list_item)
#name = list_item.text.strip()
print (name)

答案2

得分: 1

soup.select() 返回元素列表而不是元素本身。要获取每个元素的值,您需要进行迭代。

list_item = soup.select("h4 > strong.ng-binding")
print(list_item)
for item in list_item:
    name = item.text.strip()
    print(name)
英文:

soup.select() returns list of elements to not the element.To get each element value you need to iterate.

list_item=soup.select(&quot;h4 &gt; strong.ng-binding&quot;)
print (list_item)
for item in list_item:
  name = item.text.strip()
  print (name)

huangapple
  • 本文由 发表于 2020年1月7日 01:33:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/59616498.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定