问题

我正在使用Beautiful Soup文档来帮助我理解如何实现它。我对Python不太熟悉，所以也许我犯了语法错误，但我不这么认为。下面的代码应该打印出主Etsy页面中的任何链接，但它没有这样做。文档中有类似的内容，但也许我漏掉了什么。这是我的代码：

#!/usr/bin/python3
# 导入库
from bs4 import BeautifulSoup
import requests
import os.path
from os import path
# 请求网站并下载HTML内容
url = 'https://www.etsy.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=etsy_e&amp;utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&amp;utm_ag=A1&amp;utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&amp;utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&amp;utm_custom2=227553629&amp;gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB&#39;'
req = requests.get(url)
content = req.text
soup = BeautifulSoup(content, 'html.parser')
for x in soup.head.find_all('a'):
    print(x.get('href'))

如果我按照这种方式设置HTML，它可以打印出来，但我无法让for循环工作。

英文:

I'm using the Beautiful Soup documentation to help me understand how to implement it. I'm not too familiar with Python as a whole, so maybe I'm making a syntax error, but I don't believe so. The code below should print out any links from the main Etsy page, but it's not doing that. The documentation states something similar to this, but maybe I'm missing something. Here's my code:

#!/usr/bin/python3
# import library
from bs4 import BeautifulSoup
import requests
import os.path
from os import path
# Request to website and download HTML contents
url=&#39;https://www.etsy.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=etsy_e&amp;utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&amp;utm_ag=A1&amp;utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&amp;utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&amp;utm_custom2=227553629&amp;gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB&#39;
req=requests.get(url)
content=req.text
soup=BeautifulSoup(content, &#39;html.parser&#39;)
for x in soup.head.find_all(&#39;a&#39;):
	print(x.get(&#39;href&#39;))

The HTML prints if I set it up that way, but I can't get the for loop to work.

答案1

得分: 0

如果您尝试从指定的URL获取所有<a>标签，则：
    url = 'https://www.etsy.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=etsy_e&amp;utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&amp;utm_ag=A1&amp;utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&amp;utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&amp;utm_custom2=227553629&amp;gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB'
    
    with requests.get(url) as r:
        r.raise_for_status()
        soup = BeautifulSoup(r.text, 'lxml')
        if (body := soup.body):
            for a in body.find_all('a', href=True):
                print(a['href'])

英文:

If you're trying to get all <a> tags from the specified URL then:

url = &#39;https://www.etsy.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=etsy_e&amp;utm_campaign=Search_US_Brand_GGL_ENG_General-Brand_Core_All_Exact&amp;utm_ag=A1&amp;utm_custom1=_k_Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB_k_&amp;utm_content=go_227553629_16342445429_536666953103_kwd-1818581752_c_&amp;utm_custom2=227553629&amp;gclid=Cj0KCQiAi8KfBhCuARIsADp-A54MzODz8nRIxO2LnGcB8Ezc3_q40IQk9HygcSzz9fPmPWnrITz8InQaAt5oEALw_wcB&#39;
with requests.get(url) as r:
    r.raise_for_status()
    soup = BeautifulSoup(r.text, &#39;lxml&#39;)
    if (body := soup.body):
        for a in body.find_all(&#39;a&#39;, href=True):
            print(a[&#39;href&#39;])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法使for循环在使用Beautiful Soup 4解析HTML时工作

问题

答案1

在robotframework(pythonselenium)中，if语句下的语句没有被执行。

如何将外键的值自动添加到多对多字段？

在使用Django Rest Framework的ModelViewSet调用函数时出现跨域错误(Cors error)。

如何向URL添加日期以获取网站的历史数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。