英文:
Starting out with Python and Web Scraping... I don't quite understand why this isn't working?
问题
我尝试使用以下代码,在 HTML 文本打印后的每个逗号后插入新行,以分隔我尝试使用 Beautiful Soup 找到的链接(因为它们以包含不同链接的逗号作为文本出现,我想要将它们分开)。我尝试了这个,但似乎没有任何作用...而且我不知道为什么?
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = print(Extractor.find_all('link'))
for text in run:
if ',':
+"\n";
print(run)
我尝试了其他方法,但我认为它们不完全正确...而且我不太确定如何处理这个问题,所以如果有人能指出我认为非常明显的事情,那将帮助某人理解这个问题
英文:
I used the following code, to try and insert a new line after each comma that appeared after the html text was printed, to seprate the links that I was trying to find using beautifulSoup (Since they appeared as a text with commas indicating differernt links, and I wanted to seperate them). I tried this, and it doesn't seem to do anything...and I don't know why?
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = print(Extractor.find_all('link'))
for text in run:
if ',':
+"/n";
print(run)
I tried other methods aswell, but I don't think they were entirely right....and I'm not too sure how to go about this, so if someone could point out, what I'm thinking is extremely obvious, you'll be helping somoene get to grips with something
答案1
得分: -1
以下是翻译好的内容:
主要问题出在你的代码这一行:run = print(Extractor.find_all('link'))
;你将一个打印语句赋给了 run
。
如果你想查看所有内容,可以使用以下代码:
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
print(text)
如果你只想查看超链接:
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
print(text.get('href'))
如果你想将超链接存储在名为 run
的列表中:
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
run = [text.get('href') for text in run]
# 现在,run 中只包含 href 超链接
# 可选地,你可以打印出来
# print(run)
# 但是在每个链接条目之间会有逗号,因为这在语法上是强制性的
英文:
There might be no commas in the links. All the links are stored in a python list where all the entries in a python list are separated by a comma. So you cannot replace those commas with anything.
The main issue with your code is run = print(Extractor.find_all('link'))
; you are assigning a print statement to run
.
If you want to see all the contents as such:
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
print(text)
If you want to see only the hyperlinks:
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
for text in run:
print(text.get('href')
If you want to store only href links in the list run
:
file = requests.get(url)
UsualError = file.text
Extractor = BeautifulSoup(UsualError)
run = Extractor.find_all('link')
run = [text.get('href') for text in run]
# now run contains only href links
# optionally you can print
# print(run)
# but commas can be seen between each link entry, as it is syntactically mandatory
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论