如何查找 HTML 表格的 ID。

huangapple go评论109阅读模式
英文:

How to find html-table ID

问题

我想解析一个表格并通过Jsoup-Java下载它。我知道我可以使用getElementById函数来实现这个目的。我的问题是:我如何在网站的HTML代码中找到那个id?
作为示例,我将在 Wikipedia文章中提供第一个表格。

英文:

I'd like to parse over a table and download it via Jsoup-Java. I know that I can use the function getElementById for that purpose. My problem is now: How can I find that id in the html-code of a website?
As an example, I will give the first table in this wikipedia-article.

答案1

得分: 0

以下是翻译好的部分:

也许这个 Python 脚本可以帮助您下载网站的源代码:

from urllib.request import urlopen

html = urlopen("https://support.image-line.com/member/profile.php?module=Unlock").read()

f = open("source.html", 'wb')
f.write(html)
f.close()

然后,您可以使用 Python 对文件内容进行修整,从而删除 <tbody> 标签之前和之后的内容。

示例:

with open("source.html", "r") as f:
   content = f.read()
   position = content.find("<tbody>")
   content = content[position:]
   split_string = content.split("</tbody>", 1)
   
   substring = split_string[0]
   with open("table.html", "w") as out:
      out.write(substring)
      out.close()
   f.close()

现在您将会得到一个名为 "table.html" 的文件,其中包含表格内容。

英文:

Maybe this python script will help you to download the source code of a website:

from urllib.request import urlopen

html = urlopen("https://support.image-line.com/member/profile.php?module=Unlock").read()

f = open("source.html", 'wb')
f.write(html)
f.close()

Then you trim the file contents using also python, so you delete contents before the <tbody> tag and after closing it.

Example:<br>

with open(&quot;source.html&quot;, &quot;r&quot;) as f:
   content = f.read()
   position = content.find(&quot;&lt;tbody&gt;&quot;)
   content = content[position:]
   split_string = content.split(&quot;&lt;/tbody&gt;&quot;, 1)
   
   substring = split_string[0]
   with open(&quot;table.html&quot;, &quot;w&quot;) as out:
      out.write(substring)
      out.close()
   f.close()

Now you will get a file named "table.html", that contains the table.

huangapple
  • 本文由 发表于 2020年4月6日 17:58:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/61057191.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定