创建数据库表格从HTML表格中。

huangapple go评论59阅读模式
英文:

Create database tables from html tables

问题

我需要帮助从HTML表格创建数据库表格。目前我正在手动进行操作。

我有一个HTML文档,其中包含所有数据,但我不知道提取数据的最佳方式。

以下是HTML代码的一部分:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ATOMS Definition for Type tom.service.soc.SocRecord</title>
<style type="text/css">
body
{
	line-height: 1.6em;
	font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
	font-size: 14px;
	margin: 45px;
}
#box-table-a
{
	font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
	font-size: 12px;
	margin: 5%;
	width: 90%;
	text-align: left;
	border-collapse: collapse;
}
#box-table-a th
{
	font-size: 13px;
	font-weight: normal;
	padding: 8px;
	background: #b9c9fe;
	border-top: 4px solid #aabcfe;
	border-bottom: 1px solid #fff;
	color: #039;
}
#box-table-a td
{
	padding: 8px;
	background: #e8edff;
	border-bottom: 1px solid #fff;
	color: #669;
	border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
	background: #d0dafd;
	color: #339;
}
</style>
</head>
<body>
<table id="box-table-a" summary="Definition for tom.service.soc.SocRecord">
	<thead>
		<tr><th colspan="2">tom.service.soc.SocRecord</th></tr>
	</thead>
	<tbody>
		<tr>
			<td>Version</td>
			<td>1</td>
		</tr>
		<tr>
			<td>Description</td>
			<td>[type is UNCLASSIFIED] Temporary dummy test object for SOC</td>
		</tr>
	</tbody>
</table>
<table id="box-table-a" summary="Fields Definition for Type tom.service.soc.SocRecord">
    <thead>
    	<tr>
        	<th scope="col">Index</th>
            <th scope="col">Name</th>
            <th scope="col">Type</th>
            <th scope="col">Range</th>
            <th scope="col">Default</th>
            <th scope="col" width="50%">Description</th>
        </tr>
    </thead>
    <tbody>
    	<tr>
        	<td>1</td>
            <td>socID</td>
            <td>String</td>
            <td>-</td>
            <td>""</td>
            <td>[ ] The UUID of the tracked object -- String for transmission purposes</td>
        </tr>
    	<tr>
        	<td>2</td>
            <td>satID</td>
            <td><a href="../../../../../tom/state/vcm/SatNumberType.html">SatNumberType</a></td>
            <td></td>
            <td></td>
            <td>[ ] The ID of the tracked object -- copy of the satelliteId in the VCM</td>
        </tr>
    </tbody>
</table>
</body></html>

我希望从HTML创建一个像这样的PostgreSQL数据库表格创建脚本,并且如果有href链接到另一个表格,则应该包含在内:

CREATE TABLE soc.SocRecord( 
	socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
	satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
	commonName TEXT, --[ ] The name of the tracked object -- may be blank - 
					 --This field is optional in the current version of the message, check the set attribute before use.);

如果需要进一步的帮助,请告诉我。

英文:

I need help creating database tables from html tables. Right now I'm just manually doing it.
I have a html doc that has all the data but I don't know what the best way to extract the data.

&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;head&gt;
&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=utf-8&quot;&gt;
&lt;title&gt;ATOMS Definition for Type tom.service.soc.SocRecord&lt;/title&gt;
&lt;style type=&quot;text/css&quot;&gt;
body
{
line-height: 1.6em;
font-family: &quot;Lucida Sans Unicode&quot;, &quot;Lucida Grande&quot;, Sans-Serif;
font-size: 14px;
margin: 45px;
}
#box-table-a
{
font-family: &quot;Lucida Sans Unicode&quot;, &quot;Lucida Grande&quot;, Sans-Serif;
font-size: 12px;
margin: 5%;
width: 90%;
text-align: left;
border-collapse: collapse;
}
#box-table-a th
{
font-size: 13px;
font-weight: normal;
padding: 8px;
background: #b9c9fe;
border-top: 4px solid #aabcfe;
border-bottom: 1px solid #fff;
color: #039;
}
#box-table-a td
{
padding: 8px;
background: #e8edff;
border-bottom: 1px solid #fff;
color: #669;
border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
background: #d0dafd;
color: #339;
}
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;table id=&quot;box-table-a&quot; summary=&quot;Definition for tom.service.soc.SocRecord&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th colspan=&quot;2&quot;&gt;tom.service.soc.SocRecord&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Version&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Description&lt;/td&gt;
&lt;td&gt;[type is UNCLASSIFIED] Temporary dummy test object for SOC&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table id=&quot;box-table-a&quot; summary=&quot;Fields Definition for Type tom.service.soc.SocRecord&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;Index&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Name&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Type&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Range&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Default&lt;/th&gt;
&lt;th scope=&quot;col&quot; width=&quot;50%&quot;&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;socID&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;           
-
&lt;/td&gt;
&lt;td&gt;&quot;&quot;&lt;/td&gt;
&lt;td&gt;
[ ] The UUID of the tracked object -- String for transmission purposes	
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;satID&lt;/td&gt;
&lt;td&gt;&lt;a href=&quot;../../../../../tom/state/vcm/SatNumberType.html&quot;&gt;SatNumberType&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;
&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;
[ ] The ID of the tracked object -- copy of the satelliteId in the VCM	
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/body&gt;&lt;/html&gt;

Here is the html.
I would like help making a create postgres database table script like this from the html. And if it has a href then that links to another table.

CREATE TABLE soc.SocRecord( 
socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
commonName TEXT, --[ ] The name of the tracked object -- may be blank - 
--This field is optional in the current version of the message, check the set attribute before use.);

答案1

得分: 1

Here is the translated content:

编辑
尝试了一些不同的变体,发现使用 zip 没有错误,并返回了 SQL 脚本。

> for header, value in zip(headers, values):

我也更新了下面的代码。


所以你的方法可能是这样的:

读取 .html 文件
解析 table
解析 theader 等等

我觉得这个想法很有趣,所以我在 Python 中尝试了一下。

from bs4 import BeautifulSoup

# 指定 HTML 文件的路径
html_file_path = 'path/to/your/file.html'

在这里,你可以编写另一个脚本来读取文件夹中的所有 .html 文件,并执行相应的操作。

# 读取 HTML 文件的内容
with open(html_file_path, 'r') as file:
    html = file.read()

# 查找 HTML 中的所有表格
tables = soup.find_all('table')

# 遍历表格
for table in tables:
    # 查找表格的 ID 属性
    table_id = table.get('id')

    # 提取表格的表头
    headers = [th.get_text() for th in table.find('thead').find_all('th')]

    # 创建一个字典来存储表格数据
    table_data = {}

    # 遍历表格的行
    for row in table.find('tbody').find_all('tr'):
        # 提取行中的单元格
        cells = row.find_all('td')

        # 提取单元格的值
        values = [cell.get_text().strip() for cell in cells]

        # 使用表头与值来存储数据到字典中
        for header, value in zip(headers, values):
            if header not in table_data:
                table_data[header] = []
            table_data[header].append(value)

    # 生成 PostgreSQL 表格脚本
    create_table_script = f"CREATE TABLE {table_id} (\n"
    for header, values in table_data.items():
        # 处理带有空格或特殊字符的列名
        column_name = header.lower().replace(' ', '_').replace('.', '_')

        # 将列值组合成逗号分隔的字符串
        column_values = ', '.join([f"'{value}'" if isinstance(value, str) else str(value) for value in values])

        # 将列定义附加到脚本中
        create_table_script += f"    {column_name} {column_values},\n"

    create_table_script = create_table_script.rstrip(',\n') + "\n);\n"

    # 打印表格脚本
    print(create_table_script)

上述操作的结果可能不会完全符合你的要求,但可以帮助你入门。

英文:

Edit
Tried out some more variants and found that using zip resulted in no errors and returned the sql script.

> for header, value in zip(headers, values):

I've updated the code bellow as well.


So your approach could constitute something like:

read .html file
parse for table
parse for theader and so on

I find idea pretty interesting, so I tried it out in python.

from bs4 import BeautifulSoup
# Specify the path to your HTML file
html_file_path = &#39;path/to/your/file.html&#39;

here you could write another script to read all .html files in a folder and execute against that instead.

# Read the contents of the HTML file
with open(html_file_path, &#39;r&#39;) as file:
html = file.read()
# Find all the tables in the HTML
tables = soup.find_all(&#39;table&#39;)
# Iterate over the tables
for table in tables:
# Find the table&#39;s ID attribute
table_id = table.get(&#39;id&#39;)
# Extract the table headers
headers = [th.get_text() for th in table.find(&#39;thead&#39;).find_all(&#39;th&#39;)]
# Create a dictionary to store the table data
table_data = {}
# Iterate over the table rows
for row in table.find(&#39;tbody&#39;).find_all(&#39;tr&#39;):
# Extract the row cells
cells = row.find_all(&#39;td&#39;)
# Extract the cell values
values = [cell.get_text().strip() for cell in cells]
# Store the values with their corresponding headers in the dictionary
for header, value in zip(headers, values):
if header not in table_data:
table_data[header] = []
table_data[header].append(value)
# Generate the PostgreSQL table script
create_table_script = f&quot;CREATE TABLE {table_id} (\n&quot;
for header, values in table_data.items():
# Handle column names with spaces or special characters
column_name = header.lower().replace(&#39; &#39;, &#39;_&#39;).replace(&#39;.&#39;, &#39;_&#39;)
# Combine the column values into a comma-separated string
column_values = &#39;, &#39;.join([f&quot;&#39;{value}&#39;&quot; if isinstance(value, str) else str(value) for value in values])
# Append the column definition to the script
create_table_script += f&quot;    {column_name} {column_values},\n&quot;
create_table_script = create_table_script.rstrip(&#39;,\n&#39;) + &quot;\n);\n&quot;
# Print the table script
print(create_table_script)

The result of the above is probably not going to be exactly what you want, but it gets you started.

huangapple
  • 本文由 发表于 2023年7月13日 12:23:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76675914.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定