英文:
How to find html-table ID
问题
我想解析一个表格并通过Jsoup-Java下载它。我知道我可以使用getElementById
函数来实现这个目的。我的问题是:我如何在网站的HTML代码中找到那个id?
作为示例,我将在此 Wikipedia文章中提供第一个表格。
英文:
I'd like to parse over a table and download it via Jsoup-Java. I know that I can use the function getElementById
for that purpose. My problem is now: How can I find that id in the html-code of a website?
As an example, I will give the first table in this wikipedia-article.
答案1
得分: 0
以下是翻译好的部分:
也许这个 Python 脚本可以帮助您下载网站的源代码:
from urllib.request import urlopen
html = urlopen("https://support.image-line.com/member/profile.php?module=Unlock").read()
f = open("source.html", 'wb')
f.write(html)
f.close()
然后,您可以使用 Python 对文件内容进行修整,从而删除 <tbody>
标签之前和之后的内容。
示例:
with open("source.html", "r") as f:
content = f.read()
position = content.find("<tbody>")
content = content[position:]
split_string = content.split("</tbody>", 1)
substring = split_string[0]
with open("table.html", "w") as out:
out.write(substring)
out.close()
f.close()
现在您将会得到一个名为 "table.html" 的文件,其中包含表格内容。
英文:
Maybe this python script will help you to download the source code of a website:
from urllib.request import urlopen
html = urlopen("https://support.image-line.com/member/profile.php?module=Unlock").read()
f = open("source.html", 'wb')
f.write(html)
f.close()
Then you trim the file contents using also python, so you delete contents before the <tbody>
tag and after closing it.
Example:<br>
with open("source.html", "r") as f:
content = f.read()
position = content.find("<tbody>")
content = content[position:]
split_string = content.split("</tbody>", 1)
substring = split_string[0]
with open("table.html", "w") as out:
out.write(substring)
out.close()
f.close()
Now you will get a file named "table.html", that contains the table.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论