英文:
Can't get for loop to work while parsing HTML using Beautiful Soup 4
问题
我正在使用Beautiful Soup文档来帮助我理解如何实现它。我对Python不太熟悉,所以也许我犯了语法错误,但我不这么认为。下面的代码应该打印出主Etsy页面中的任何链接,但它没有这样做。文档中有类似的内容,但也许我漏掉了什么。这是我的代码:
#!/usr/bin/python3
# 导入库
from bs4 import BeautifulSoup
import requests
import os.path
from os import path
# 请求网站并下载HTML内容
url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB''
req = requests.get(url)
content = req.text
soup = BeautifulSoup(content, 'html.parser')
for x in soup.head.find_all('a'):
print(x.get('href'))
如果我按照这种方式设置HTML,它可以打印出来,但我无法让for循环工作。
英文:
I'm using the Beautiful Soup documentation to help me understand how to implement it. I'm not too familiar with Python as a whole, so maybe I'm making a syntax error, but I don't believe so. The code below should print out any links from the main Etsy page, but it's not doing that. The documentation states something similar to this, but maybe I'm missing something. Here's my code:
#!/usr/bin/python3
# import library
from bs4 import BeautifulSoup
import requests
import os.path
from os import path
# Request to website and download HTML contents
url='https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
req=requests.get(url)
content=req.text
soup=BeautifulSoup(content, 'html.parser')
for x in soup.head.find_all('a'):
print(x.get('href'))
The HTML prints if I set it up that way, but I can't get the for loop to work.
答案1
得分: 0
如果您尝试从指定的URL获取所有<a>标签,则:
url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
with requests.get(url) as r:
r.raise_for_status()
soup = BeautifulSoup(r.text, 'lxml')
if (body := soup.body):
for a in body.find_all('a', href=True):
print(a['href'])
英文:
If you're trying to get all <a> tags from the specified URL then:
url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
with requests.get(url) as r:
r.raise_for_status()
soup = BeautifulSoup(r.text, 'lxml')
if (body := soup.body):
for a in body.find_all('a', href=True):
print(a['href'])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论