2023年4月4日 17:51:21go评论103阅读模式

英文:

Starting out with Python and Web Scraping... I don't quite understand why this isn't working?

问题

我尝试使用以下代码，在 HTML 文本打印后的每个逗号后插入新行，以分隔我尝试使用 Beautiful Soup 找到的链接（因为它们以包含不同链接的逗号作为文本出现，我想要将它们分开）。我尝试了这个，但似乎没有任何作用...而且我不知道为什么？

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = print(Extractor.find_all('link'))
for text in run:
    if ',':
        +"\n";
print(run)

我尝试了其他方法，但我认为它们不完全正确...而且我不太确定如何处理这个问题，所以如果有人能指出我认为非常明显的事情，那将帮助某人理解这个问题

英文:

I used the following code, to try and insert a new line after each comma that appeared after the html text was printed, to seprate the links that I was trying to find using beautifulSoup (Since they appeared as a text with commas indicating differernt links, and I wanted to seperate them). I tried this, and it doesn't seem to do anything...and I don't know why?

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = print(Extractor.find_all(&#39;link&#39;))
for text in run: 
    if &#39;,&#39;:
        +&quot;/n&quot;;
print(run)

I tried other methods aswell, but I don't think they were entirely right....and I'm not too sure how to go about this, so if someone could point out, what I'm thinking is extremely obvious, you'll be helping somoene get to grips with something

答案1

得分: -1

以下是翻译好的内容：

主要问题出在你的代码这一行：run = print(Extractor.find_all('link'))；你将一个打印语句赋给了 run。

如果你想查看所有内容，可以使用以下代码：

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
    print(text)

如果你只想查看超链接：

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
    print(text.get('href'))

如果你想将超链接存储在名为 run 的列表中：

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
run = [text.get('href') for text in run]
# 现在，run 中只包含 href 超链接
# 可选地，你可以打印出来
# print(run)
# 但是在每个链接条目之间会有逗号，因为这在语法上是强制性的

英文:

There might be no commas in the links. All the links are stored in a python list where all the entries in a python list are separated by a comma. So you cannot replace those commas with anything.

The main issue with your code is run = print(Extractor.find_all('link')); you are assigning a print statement to run.

If you want to see all the contents as such:

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all(&#39;link&#39;)
for text in run: 
    print(text)

If you want to see only the hyperlinks:

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all(&#39;link&#39;)
for text in run: 
    print(text.get(&#39;href&#39;)

If you want to store only href links in the list run:

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all(&#39;link&#39;)
run = [text.get(&#39;href&#39;) for text in run]
# now run contains only href links
# optionally you can print
# print(run)
# but commas can be seen between each link entry, as it is syntactically mandatory

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Starting out with Python and Web Scraping… I don’t quite understand why this isn’t working?

问题

答案1

八进制转十进制的数字转换

我如何计算Django中用户所有帖子的总点赞数？

如何修复末尾出现的多余单词？

使用点而不是条形来制作频率图？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。