无法使for循环在使用Beautiful Soup 4解析HTML时工作

huangapple go评论55阅读模式
英文:

Can't get for loop to work while parsing HTML using Beautiful Soup 4

问题

我正在使用Beautiful Soup文档来帮助我理解如何实现它。我对Python不太熟悉,所以也许我犯了语法错误,但我不这么认为。下面的代码应该打印出主Etsy页面中的任何链接,但它没有这样做。文档中有类似的内容,但也许我漏掉了什么。这是我的代码:

#!/usr/bin/python3

# 导入库
from bs4 import BeautifulSoup
import requests
import os.path
from os import path

# 请求网站并下载HTML内容
url = 'https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB''
req = requests.get(url)
content = req.text

soup = BeautifulSoup(content, 'html.parser')

for x in soup.head.find_all('a'):
    print(x.get('href'))

如果我按照这种方式设置HTML,它可以打印出来,但我无法让for循环工作。

英文:

I'm using the Beautiful Soup documentation to help me understand how to implement it. I'm not too familiar with Python as a whole, so maybe I'm making a syntax error, but I don't believe so. The code below should print out any links from the main Etsy page, but it's not doing that. The documentation states something similar to this, but maybe I'm missing something. Here's my code:

#!/usr/bin/python3

# import library
from bs4 import BeautifulSoup
import requests
import os.path
from os import path

# Request to website and download HTML contents
url='https://www.etsy.com/?utm_source=google&utm_medium=cpc&utm_term=etsy_e&utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&utm_ag=A1&utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&utm_custom2=227553629&gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
req=requests.get(url)
content=req.text

soup=BeautifulSoup(content, 'html.parser')

for x in soup.head.find_all('a'):
	print(x.get('href'))

The HTML prints if I set it up that way, but I can't get the for loop to work.

答案1

得分: 0

如果您尝试从指定的URL获取所有<a>标签

    url = 'https://www.etsy.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=etsy_e&amp;utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&amp;utm_ag=A1&amp;utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&amp;utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&amp;utm_custom2=227553629&amp;gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
    
    with requests.get(url) as r:
        r.raise_for_status()
        soup = BeautifulSoup(r.text, 'lxml')
        if (body := soup.body):
            for a in body.find_all('a', href=True):
                print(a['href'])
英文:

If you're trying to get all <a> tags from the specified URL then:

url = &#39;https://www.etsy.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=etsy_e&amp;utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&amp;utm_ag=A1&amp;utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&amp;utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&amp;utm_custom2=227553629&amp;gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB&#39;

with requests.get(url) as r:
    r.raise_for_status()
    soup = BeautifulSoup(r.text, &#39;lxml&#39;)
    if (body := soup.body):
        for a in body.find_all(&#39;a&#39;, href=True):
            print(a[&#39;href&#39;])

huangapple
  • 本文由 发表于 2023年2月19日 15:08:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498529.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定