Starting out with Python and Web Scraping… I don’t quite understand why this isn’t working?

huangapple go评论61阅读模式
英文:

Starting out with Python and Web Scraping... I don't quite understand why this isn't working?

问题

我尝试使用以下代码,在 HTML 文本打印后的每个逗号后插入新行,以分隔我尝试使用 Beautiful Soup 找到的链接(因为它们以包含不同链接的逗号作为文本出现,我想要将它们分开)。我尝试了这个,但似乎没有任何作用...而且我不知道为什么?

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = print(Extractor.find_all('link'))
for text in run:
    if ',':
        +"\n";
print(run)

我尝试了其他方法,但我认为它们不完全正确...而且我不太确定如何处理这个问题,所以如果有人能指出我认为非常明显的事情,那将帮助某人理解这个问题 Starting out with Python and Web Scraping… I don’t quite understand why this isn’t working?

英文:

I used the following code, to try and insert a new line after each comma that appeared after the html text was printed, to seprate the links that I was trying to find using beautifulSoup (Since they appeared as a text with commas indicating differernt links, and I wanted to seperate them). I tried this, and it doesn't seem to do anything...and I don't know why?

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = print(Extractor.find_all('link'))
for text in run: 
    if ',':
        +"/n";
print(run)

I tried other methods aswell, but I don't think they were entirely right....and I'm not too sure how to go about this, so if someone could point out, what I'm thinking is extremely obvious, you'll be helping somoene get to grips with something Starting out with Python and Web Scraping… I don’t quite understand why this isn’t working?

答案1

得分: -1

以下是翻译好的内容:

主要问题出在你的代码这一行:run = print(Extractor.find_all('link'));你将一个打印语句赋给了 run

如果你想查看所有内容,可以使用以下代码:

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
    print(text)

如果你只想查看超链接:

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
    print(text.get('href'))

如果你想将超链接存储在名为 run 的列表中:

file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
run = [text.get('href') for text in run]

# 现在,run 中只包含 href 超链接
# 可选地,你可以打印出来
# print(run)
# 但是在每个链接条目之间会有逗号,因为这在语法上是强制性的
英文:

There might be no commas in the links. All the links are stored in a python list where all the entries in a python list are separated by a comma. So you cannot replace those commas with anything.

The main issue with your code is run = print(Extractor.find_all('link')); you are assigning a print statement to run.

If you want to see all the contents as such:

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run: 
    print(text)

If you want to see only the hyperlinks:

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run: 
    print(text.get('href')

If you want to store only href links in the list run:

file = requests.get(url) 
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
run = [text.get('href') for text in run]

# now run contains only href links
# optionally you can print
# print(run)
# but commas can be seen between each link entry, as it is syntactically mandatory

huangapple
  • 本文由 发表于 2023年4月4日 17:51:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927947.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定