2023年7月13日 12:23:29go评论97阅读模式

英文:

Create database tables from html tables

问题

我需要帮助从HTML表格创建数据库表格。目前我正在手动进行操作。

我有一个HTML文档，其中包含所有数据，但我不知道提取数据的最佳方式。

以下是HTML代码的一部分：

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ATOMS Definition for Type tom.service.soc.SocRecord</title>
<style type="text/css">
body
{
	line-height: 1.6em;
	font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
	font-size: 14px;
	margin: 45px;
}
#box-table-a
{
	font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
	font-size: 12px;
	margin: 5%;
	width: 90%;
	text-align: left;
	border-collapse: collapse;
}
#box-table-a th
{
	font-size: 13px;
	font-weight: normal;
	padding: 8px;
	background: #b9c9fe;
	border-top: 4px solid #aabcfe;
	border-bottom: 1px solid #fff;
	color: #039;
}
#box-table-a td
{
	padding: 8px;
	background: #e8edff;
	border-bottom: 1px solid #fff;
	color: #669;
	border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
	background: #d0dafd;
	color: #339;
}
</style>
</head>
<body>
<table id="box-table-a" summary="Definition for tom.service.soc.SocRecord">
	<thead>
		<tr><th colspan="2">tom.service.soc.SocRecord</th></tr>
	</thead>
	<tbody>
		<tr>
			<td>Version</td>
			<td>1</td>
		</tr>
		<tr>
			<td>Description</td>
			<td>[type is UNCLASSIFIED] Temporary dummy test object for SOC</td>
		</tr>
	</tbody>
</table>
<table id="box-table-a" summary="Fields Definition for Type tom.service.soc.SocRecord">
    <thead>
    	<tr>
        	<th scope="col">Index</th>
            <th scope="col">Name</th>
            <th scope="col">Type</th>
            <th scope="col">Range</th>
            <th scope="col">Default</th>
            <th scope="col" width="50%">Description</th>
        </tr>
    </thead>
    <tbody>
    	<tr>
        	<td>1</td>
            <td>socID</td>
            <td>String</td>
            <td>-</td>
            <td>""</td>
            <td>[ ] The UUID of the tracked object -- String for transmission purposes</td>
        </tr>
    	<tr>
        	<td>2</td>
            <td>satID</td>
            <td><a href="../../../../../tom/state/vcm/SatNumberType.html">SatNumberType</a></td>
            <td></td>
            <td></td>
            <td>[ ] The ID of the tracked object -- copy of the satelliteId in the VCM</td>
        </tr>
    </tbody>
</table>
</body></html>

我希望从HTML创建一个像这样的PostgreSQL数据库表格创建脚本，并且如果有href链接到另一个表格，则应该包含在内：

CREATE TABLE soc.SocRecord( 
	socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
	satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
	commonName TEXT, --[ ] The name of the tracked object -- may be blank - 
					 --This field is optional in the current version of the message, check the set attribute before use.);

如果需要进一步的帮助，请告诉我。

英文:

I need help creating database tables from html tables. Right now I'm just manually doing it.
I have a html doc that has all the data but I don't know what the best way to extract the data.

&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;head&gt;
&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=utf-8&quot;&gt;
&lt;title&gt;ATOMS Definition for Type tom.service.soc.SocRecord&lt;/title&gt;
&lt;style type=&quot;text/css&quot;&gt;
body
{
line-height: 1.6em;
font-family: &quot;Lucida Sans Unicode&quot;, &quot;Lucida Grande&quot;, Sans-Serif;
font-size: 14px;
margin: 45px;
}
#box-table-a
{
font-family: &quot;Lucida Sans Unicode&quot;, &quot;Lucida Grande&quot;, Sans-Serif;
font-size: 12px;
margin: 5%;
width: 90%;
text-align: left;
border-collapse: collapse;
}
#box-table-a th
{
font-size: 13px;
font-weight: normal;
padding: 8px;
background: #b9c9fe;
border-top: 4px solid #aabcfe;
border-bottom: 1px solid #fff;
color: #039;
}
#box-table-a td
{
padding: 8px;
background: #e8edff;
border-bottom: 1px solid #fff;
color: #669;
border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
background: #d0dafd;
color: #339;
}
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;table id=&quot;box-table-a&quot; summary=&quot;Definition for tom.service.soc.SocRecord&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th colspan=&quot;2&quot;&gt;tom.service.soc.SocRecord&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Version&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Description&lt;/td&gt;
&lt;td&gt;[type is UNCLASSIFIED] Temporary dummy test object for SOC&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table id=&quot;box-table-a&quot; summary=&quot;Fields Definition for Type tom.service.soc.SocRecord&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;Index&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Name&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Type&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Range&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Default&lt;/th&gt;
&lt;th scope=&quot;col&quot; width=&quot;50%&quot;&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;socID&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;           
-
&lt;/td&gt;
&lt;td&gt;&quot;&quot;&lt;/td&gt;
&lt;td&gt;
[ ] The UUID of the tracked object -- String for transmission purposes	
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;satID&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;../../../../../tom/state/vcm/SatNumberType.html&quot;&gt;SatNumberType&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;
&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;
[ ] The ID of the tracked object -- copy of the satelliteId in the VCM	
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/body&gt;&lt;/html&gt;

Here is the html.
I would like help making a create postgres database table script like this from the html. And if it has a href then that links to another table.

CREATE TABLE soc.SocRecord( 
socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
commonName TEXT, --[ ] The name of the tracked object -- may be blank - 
--This field is optional in the current version of the message, check the set attribute before use.);

答案1

得分: 1

Here is the translated content:

编辑
尝试了一些不同的变体，发现使用 zip 没有错误，并返回了 SQL 脚本。

> for header, value in zip(headers, values):

我也更新了下面的代码。

所以你的方法可能是这样的：

读取 .html 文件
解析 table
解析 theader 等等

我觉得这个想法很有趣，所以我在 Python 中尝试了一下。

from bs4 import BeautifulSoup
# 指定 HTML 文件的路径
html_file_path = 'path/to/your/file.html'

在这里，你可以编写另一个脚本来读取文件夹中的所有 .html 文件，并执行相应的操作。

# 读取 HTML 文件的内容
with open(html_file_path, 'r') as file:
    html = file.read()
# 查找 HTML 中的所有表格
tables = soup.find_all('table')
# 遍历表格
for table in tables:
    # 查找表格的 ID 属性
    table_id = table.get('id')
    # 提取表格的表头
    headers = [th.get_text() for th in table.find('thead').find_all('th')]
    # 创建一个字典来存储表格数据
    table_data = {}
    # 遍历表格的行
    for row in table.find('tbody').find_all('tr'):
        # 提取行中的单元格
        cells = row.find_all('td')
        # 提取单元格的值
        values = [cell.get_text().strip() for cell in cells]
        # 使用表头与值来存储数据到字典中
        for header, value in zip(headers, values):
            if header not in table_data:
                table_data[header] = []
            table_data[header].append(value)
    # 生成 PostgreSQL 表格脚本
    create_table_script = f"CREATE TABLE {table_id} (\n"
    for header, values in table_data.items():
        # 处理带有空格或特殊字符的列名
        column_name = header.lower().replace(' ', '_').replace('.', '_')
        # 将列值组合成逗号分隔的字符串
        column_values = ', '.join([f"'{value}'" if isinstance(value, str) else str(value) for value in values])
        # 将列定义附加到脚本中
        create_table_script += f"    {column_name} {column_values},\n"
    create_table_script = create_table_script.rstrip(',\n') + "\n);\n"
    # 打印表格脚本
    print(create_table_script)

上述操作的结果可能不会完全符合你的要求，但可以帮助你入门。

英文:

Edit
Tried out some more variants and found that using zip resulted in no errors and returned the sql script.

> for header, value in zip(headers, values):

I've updated the code bellow as well.

So your approach could constitute something like:

read .html file
parse for table
parse for theader and so on

I find idea pretty interesting, so I tried it out in python.

from bs4 import BeautifulSoup
# Specify the path to your HTML file
html_file_path = &#39;path/to/your/file.html&#39;

here you could write another script to read all .html files in a folder and execute against that instead.

# Read the contents of the HTML file
with open(html_file_path, &#39;r&#39;) as file:
html = file.read()
# Find all the tables in the HTML
tables = soup.find_all(&#39;table&#39;)
# Iterate over the tables
for table in tables:
# Find the table&#39;s ID attribute
table_id = table.get(&#39;id&#39;)
# Extract the table headers
headers = [th.get_text() for th in table.find(&#39;thead&#39;).find_all(&#39;th&#39;)]
# Create a dictionary to store the table data
table_data = {}
# Iterate over the table rows
for row in table.find(&#39;tbody&#39;).find_all(&#39;tr&#39;):
# Extract the row cells
cells = row.find_all(&#39;td&#39;)
# Extract the cell values
values = [cell.get_text().strip() for cell in cells]
# Store the values with their corresponding headers in the dictionary
for header, value in zip(headers, values):
if header not in table_data:
table_data[header] = []
table_data[header].append(value)
# Generate the PostgreSQL table script
create_table_script = f&quot;CREATE TABLE {table_id} (\n&quot;
for header, values in table_data.items():
# Handle column names with spaces or special characters
column_name = header.lower().replace(&#39; &#39;, &#39;_&#39;).replace(&#39;.&#39;, &#39;_&#39;)
# Combine the column values into a comma-separated string
column_values = &#39;, &#39;.join([f&quot;&#39;{value}&#39;&quot; if isinstance(value, str) else str(value) for value in values])
# Append the column definition to the script
create_table_script += f&quot;    {column_name} {column_values},\n&quot;
create_table_script = create_table_script.rstrip(&#39;,\n&#39;) + &quot;\n);\n&quot;
# Print the table script
print(create_table_script)

The result of the above is probably not going to be exactly what you want, but it gets you started.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建数据库表格从HTML表格中。

问题

答案1

Implementing nextval(); sequence in Laravel

PostgreSQL list parameter in Go (using database/SQL and pq)

Hugo: 在构建时添加 http-equiv=”refresh” content=”0。如何停止它？

我们如何在Apache Age中使用筛选器进行条件筛选？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。