2020年4月6日 17:58:37go评论149阅读模式

英文:

How to find html-table ID

问题

我想解析一个表格并通过Jsoup-Java下载它。我知道我可以使用getElementById函数来实现这个目的。我的问题是：我如何在网站的HTML代码中找到那个id？
作为示例，我将在此 Wikipedia文章中提供第一个表格。

英文:

I'd like to parse over a table and download it via Jsoup-Java. I know that I can use the function getElementById for that purpose. My problem is now: How can I find that id in the html-code of a website?
As an example, I will give the first table in this wikipedia-article.

答案1

得分: 0

以下是翻译好的部分：

也许这个 Python 脚本可以帮助您下载网站的源代码：

from urllib.request import urlopen
html = urlopen("https://support.image-line.com/member/profile.php?module=Unlock").read()
f = open("source.html", 'wb')
f.write(html)
f.close()

然后，您可以使用 Python 对文件内容进行修整，从而删除 <tbody> 标签之前和之后的内容。

示例：

with open("source.html", "r") as f:
   content = f.read()
   position = content.find("&lt;tbody&gt;")
   content = content[position:]
   split_string = content.split("&lt;/tbody&gt;", 1)
   
   substring = split_string[0]
   with open("table.html", "w") as out:
      out.write(substring)
      out.close()
   f.close()

现在您将会得到一个名为 "table.html" 的文件，其中包含表格内容。

英文:

Maybe this python script will help you to download the source code of a website:

from urllib.request import urlopen
html = urlopen(&quot;https://support.image-line.com/member/profile.php?module=Unlock&quot;).read()
f = open(&quot;source.html&quot;, &#39;wb&#39;)
f.write(html)
f.close()

Then you trim the file contents using also python, so you delete contents before the <tbody> tag and after closing it.

Example:<br>

with open(&quot;source.html&quot;, &quot;r&quot;) as f:
   content = f.read()
   position = content.find(&quot;&lt;tbody&gt;&quot;)
   content = content[position:]
   split_string = content.split(&quot;&lt;/tbody&gt;&quot;, 1)
   
   substring = split_string[0]
   with open(&quot;table.html&quot;, &quot;w&quot;) as out:
      out.write(substring)
      out.close()
   f.close()

Now you will get a file named "table.html", that contains the table.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何查找 HTML 表格的 ID。

问题

答案1

替代Thread.sleep()以提升性能？

如何在设置完我的JLabel位置后不重新绘制

Java Spring: 如何在Gradle中声明Maven依赖

JNDI资源在web-fragment中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。