问题

我是新手，但我已经创建了一个用于在网页上进行链接爬取的代码。

这是我有的代码：
```python
page_to_scrape=requests.get('http://lungtung.com/nhacvang/pub/tapesbyletr.asp?strLTR=T&page=15')

soup=BeautifulSoup(page_to_scrape.text,"html.parser")
title=soup.findAll("div", attrs={"class": "subsection"})

for x in zip(title):
    print(x)

x.get_text()

它给我返回的结果是：

(<div class="subsection"><a href="tapes_d.asp?FrTapeID=819">Trường Sơn (đĩa nhựa): TS-000168-1</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=39">Trường Sơn 1: Hát Giữa Quê Hương</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=40">Trường Sơn 2: Quê Hương và Tuổi Trẻ</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=41">Trường Sơn 3: Quê Hương và Người Tình</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=42">Trường Sơn 4: Hôm Nay, Ngày Mai</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=43">Trường Sơn 5: Tình Trong Khói Lửa</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=44">Trường Sơn 6: Quê Hương và Tuổi Loạn</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=45">Trường Sơn 7: Quê Hương, Mùa Trăng, Mùa Thu</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=46">Trường Sơn 8: Băng Nhạc Trường Sơn</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=175">Trần Ngọc Đức: Băng Vàng - Bóng Tình Yêu</a></div>,)

这让我很高兴，因为我知道我正在取得进展，但我希望它只打印出末尾链接的名称（例如："truong son 1", "truong son 2"等）。我应该怎么做？我觉得我需要在beautifulsoup库中使用一个不同的函数，但我不知道是什么函数。


<details>
<summary>英文:</summary>

I&#39;m new at this but Ive created a code to webscrape a list of links on a webpage.

here is the code that I have

page_to_scrape=requests.get('http://lungtung.com/nhacvang/pub/tapesbyletr.asp?strLTR=T&page=15')

soup=BeautifulSoup(page_to_scrape.text,"html.parser")
title=soup.findAll("div", attrs={"class": "subsection"})

for x in zip(title):
print(x)

x.get_text()

the results that it gives me is

(<div class="subsection"><a href="tapes_d.asp?FrTapeID=819">Trường Sơn (đĩa nhựa): TS-000168-1</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=39">Trường Sơn 1: Hát Giữa Quê Hương</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=40">Trường Sơn 2: Quê Hương và Tuổi Trẻ</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=41">Trường Sơn 3: Quê Hương và Người Tình</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=42">Trường Sơn 4: Hôm Nay, Ngày Mai</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=43">Trường Sơn 5: Tình Trong Khói Lửa</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=44">Trường Sơn 6: Quê Hương và Tuổi Loạn</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=45">Trường Sơn 7: Quê Hương, Mùa Trăng, Mùa Thu</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=46">Trường Sơn 8: Băng Nhạc Trường Sơn</a></div>,)
(<div class="subsection"><a href="tapes_d.asp?FrTapeID=175">Trần Ngọc Đức: Băng Vàng - Bóng Tình Yêu</a></div>,)




this makes me happy because i know im getting somewhere, but I want it to print out is the only the names of the links towards the end (truong son 1, truong son 2, etc)
how would i go about that? i feel like i have to use a different function in the beautifulsoup library. but i dont know what.

</details>


# 答案1
**得分**: 0

你可以使用select来与选择器一起使用。名称在div.subsection中，在这段代码中，我将所有内容附加到一个列表中。实际上，你可以将这个列表转换为数据框或其他东西。

```python
page_to_scrape = requests.get('http://lungtung.com/nhacvang/pub/tapesbyletr.asp?strLTR=T&page=15')

soup = BeautifulSoup(page_to_scrape.text, "html.parser")
title = soup.select('div.subsection')

data = []
for x in title:
    data.append(x.text)

print(data)

我不知道你为什么只抓取一个页面，但你可以像这样抓取所有页面。

data = []
for i in range(1, 16):
    page_to_scrape = requests.get(f'http://lungtung.com/nhacvang/pub/tapesbyletr.asp?strLTR=T&page={i}')
    soup = BeautifulSoup(page_to_scrape.text, "html.parser")
    title = soup.select('div.subsection')
    for x in title:
        data.append(x.text)

print(data)

英文:

You can use select for using with selectors. The names is in div.subsection and in this code I appended all to one list. Actually you can convert this list to dataframe or something.

page_to_scrape=requests.get(&#39;http://lungtung.com/nhacvang/pub/tapesbyletr.asp?strLTR=T&amp;page=15&#39;)

soup=BeautifulSoup(page_to_scrape.text,&quot;html.parser&quot;)
title=soup.select(&#39;div.subsection&#39;)


data = []
for x in title:
    data.append(x.text)

print(data)

I do not know why are you scraping just one page but you can scrape all pages like that.


data = []
for i in range(1,16):
    page_to_scrape = requests.get(f&#39;http://lungtung.com/nhacvang/pub/tapesbyletr.asp?strLTR=T&amp;page={i}&#39;)
    soup = BeautifulSoup(page_to_scrape.text, &quot;html.parser&quot;)
    title = soup.select(&#39;div.subsection&#39;)
    for x in title:
        data.append(x.text)



print(data)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Python的Beautiful Soup仅获取网页链接的标题

问题

从文本文件提取URL

如何在Python中使用dbf库工作后保存字段类型和字段大小？

创建具有独立依赖关系的动态Airflow任务。

Python代码卡住了，甚至CTRL+C也无法帮助退出。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论