使用BeautifulSoup从文本中删除标签。

huangapple go评论114阅读模式
英文:

Removing tags from text with BeautifulSoup

问题

Sure, here's the translated code portion:

  1. 我有这段代码来从NightBot的频道页面提取歌曲名称
  2. ```python
  3. import urllib.request
  4. from bs4 import BeautifulSoup
  5. from selenium import webdriver
  6. driver = webdriver.Firefox(executable_path=r'C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe')
  7. driver.get('https://nightbot.tv/t/tonyxzero/song_requests')
  8. html = driver.page_source
  9. soup = BeautifulSoup(html, 'html.parser')
  10. list_item = soup.select("h4 > strong.ng-binding")
  11. print(list_item)
  12. name = list_item.text.strip()
  13. print(name)

但是当我运行它时,显示了类似以下内容:

  1. [<strong class="ng-binding">Jamiroquai - Virtual Insanity (Official Video)<!-- ngIf: currentSong.track.artist --><span class="ng-binding ng-scope" ng-if="currentSong.track.artist" style=""></span><!-- end ngIf: currentSong.track.artist --></strong>]

然后出现了以下错误:

  1. AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

是否有另一种方法只显示文本而不包括标签?

  1. <details>
  2. <summary>英文:</summary>
  3. I&#39;ve this code to extract a song name from NightBot&#39;s channel page:

import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox(executable_path=r'C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe')
driver.get ('https://nightbot.tv/t/tonyxzero/song_requests')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')
list_item=soup.select("h4 > strong.ng-binding")
print (list_item)
name = list_item.text.strip()
print (name)

  1. But when i run it, shows me something like this:

[<strong class="ng-binding">Jamiroquai - Virtual Insanity (Official Video)<!-- ngIf: currentSong.track.artist --><span class="ng-binding ng-scope" ng-if="currentSong.track.artist" style=""> — JamiroquaiVEVO</span><!-- end ngIf: currentSong.track.artist --></strong>]

  1. And them this:
  2. ```AttributeError: ResultSet object has no attribute &#39;text&#39;. You&#39;re probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?```
  3. Theres another way to just show the text without the tags?
  4. </details>
  5. # 答案1
  6. **得分**: 1
  7. ```python
  8. import urllib.request
  9. from bs4 import BeautifulSoup
  10. from selenium import webdriver
  11. driver = webdriver.Firefox(executable_path=r'C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe')
  12. driver.get('https://nightbot.tv/t/tonyxzero/song_requests')
  13. html = driver.page_source
  14. soup = BeautifulSoup(html, 'lxml')
  15. name = soup.find('strong', {'class': 'ng-binding'}).text
  16. #print (list_item)
  17. #name = list_item.text.strip()
  18. print(name)
英文:
  1. import urllib.request
  2. from bs4 import BeautifulSoup
  3. from selenium import webdriver
  4. driver = webdriver.Firefox(executable_path=r&#39;C:\Users\gabri\AppData\Local\Programs\Python\Python38-32\geckodriver.exe&#39;)
  5. driver.get (&#39;https://nightbot.tv/t/tonyxzero/song_requests&#39;)
  6. html = driver.page_source
  7. soup = BeautifulSoup(html, &#39;lxml&#39;)
  8. name=soup.find(&#39;strong&#39;,{&#39;class&#39;:&#39;ng-binding&#39;}).text
  9. #print (list_item)
  10. #name = list_item.text.strip()
  11. print (name)

答案2

得分: 1

soup.select() 返回元素列表而不是元素本身。要获取每个元素的值,您需要进行迭代。

  1. list_item = soup.select("h4 > strong.ng-binding")
  2. print(list_item)
  3. for item in list_item:
  4. name = item.text.strip()
  5. print(name)
英文:

soup.select() returns list of elements to not the element.To get each element value you need to iterate.

  1. list_item=soup.select(&quot;h4 &gt; strong.ng-binding&quot;)
  2. print (list_item)
  3. for item in list_item:
  4. name = item.text.strip()
  5. print (name)

huangapple
  • 本文由 发表于 2020年1月7日 01:33:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/59616498.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定