英文:
Create database tables from html tables
问题
我需要帮助从HTML表格创建数据库表格。目前我正在手动进行操作。
我有一个HTML文档,其中包含所有数据,但我不知道提取数据的最佳方式。
以下是HTML代码的一部分:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ATOMS Definition for Type tom.service.soc.SocRecord</title>
<style type="text/css">
body
{
line-height: 1.6em;
font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
font-size: 14px;
margin: 45px;
}
#box-table-a
{
font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
font-size: 12px;
margin: 5%;
width: 90%;
text-align: left;
border-collapse: collapse;
}
#box-table-a th
{
font-size: 13px;
font-weight: normal;
padding: 8px;
background: #b9c9fe;
border-top: 4px solid #aabcfe;
border-bottom: 1px solid #fff;
color: #039;
}
#box-table-a td
{
padding: 8px;
background: #e8edff;
border-bottom: 1px solid #fff;
color: #669;
border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
background: #d0dafd;
color: #339;
}
</style>
</head>
<body>
<table id="box-table-a" summary="Definition for tom.service.soc.SocRecord">
<thead>
<tr><th colspan="2">tom.service.soc.SocRecord</th></tr>
</thead>
<tbody>
<tr>
<td>Version</td>
<td>1</td>
</tr>
<tr>
<td>Description</td>
<td>[type is UNCLASSIFIED] Temporary dummy test object for SOC</td>
</tr>
</tbody>
</table>
<table id="box-table-a" summary="Fields Definition for Type tom.service.soc.SocRecord">
<thead>
<tr>
<th scope="col">Index</th>
<th scope="col">Name</th>
<th scope="col">Type</th>
<th scope="col">Range</th>
<th scope="col">Default</th>
<th scope="col" width="50%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>socID</td>
<td>String</td>
<td>-</td>
<td>""</td>
<td>[ ] The UUID of the tracked object -- String for transmission purposes</td>
</tr>
<tr>
<td>2</td>
<td>satID</td>
<td><a href="../../../../../tom/state/vcm/SatNumberType.html">SatNumberType</a></td>
<td></td>
<td></td>
<td>[ ] The ID of the tracked object -- copy of the satelliteId in the VCM</td>
</tr>
</tbody>
</table>
</body></html>
我希望从HTML创建一个像这样的PostgreSQL数据库表格创建脚本,并且如果有href链接到另一个表格,则应该包含在内:
CREATE TABLE soc.SocRecord(
socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
commonName TEXT, --[ ] The name of the tracked object -- may be blank -
--This field is optional in the current version of the message, check the set attribute before use.);
如果需要进一步的帮助,请告诉我。
英文:
I need help creating database tables from html tables. Right now I'm just manually doing it.
I have a html doc that has all the data but I don't know what the best way to extract the data.
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ATOMS Definition for Type tom.service.soc.SocRecord</title>
<style type="text/css">
body
{
line-height: 1.6em;
font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
font-size: 14px;
margin: 45px;
}
#box-table-a
{
font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
font-size: 12px;
margin: 5%;
width: 90%;
text-align: left;
border-collapse: collapse;
}
#box-table-a th
{
font-size: 13px;
font-weight: normal;
padding: 8px;
background: #b9c9fe;
border-top: 4px solid #aabcfe;
border-bottom: 1px solid #fff;
color: #039;
}
#box-table-a td
{
padding: 8px;
background: #e8edff;
border-bottom: 1px solid #fff;
color: #669;
border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
background: #d0dafd;
color: #339;
}
</style>
</head>
<body>
<table id="box-table-a" summary="Definition for tom.service.soc.SocRecord">
<thead>
<tr><th colspan="2">tom.service.soc.SocRecord</th></tr>
</thead>
<tbody>
<tr>
<td>Version</td>
<td>1</td>
</tr>
<tr>
<td>Description</td>
<td>[type is UNCLASSIFIED] Temporary dummy test object for SOC</td>
</tr>
</tbody>
</table>
<table id="box-table-a" summary="Fields Definition for Type tom.service.soc.SocRecord">
<thead>
<tr>
<th scope="col">Index</th>
<th scope="col">Name</th>
<th scope="col">Type</th>
<th scope="col">Range</th>
<th scope="col">Default</th>
<th scope="col" width="50%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>socID</td>
<td>String</td>
<td>
-
</td>
<td>""</td>
<td>
[ ] The UUID of the tracked object -- String for transmission purposes
</td>
</tr>
<tr>
<td>2</td>
<td>satID</td>
<td><a href="../../../../../tom/state/vcm/SatNumberType.html">SatNumberType</a></td>
<td>
</td>
<td></td>
<td>
[ ] The ID of the tracked object -- copy of the satelliteId in the VCM
</td>
</tr>
</tbody>
</table>
</body></html>
Here is the html.
I would like help making a create postgres database table script like this from the html. And if it has a href then that links to another table.
CREATE TABLE soc.SocRecord(
socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
commonName TEXT, --[ ] The name of the tracked object -- may be blank -
--This field is optional in the current version of the message, check the set attribute before use.);
答案1
得分: 1
Here is the translated content:
编辑
尝试了一些不同的变体,发现使用 zip
没有错误,并返回了 SQL 脚本。
> for header, value in zip(headers, values):
我也更新了下面的代码。
所以你的方法可能是这样的:
读取 .html
文件
解析 table
解析 theader
等等
我觉得这个想法很有趣,所以我在 Python 中尝试了一下。
from bs4 import BeautifulSoup
# 指定 HTML 文件的路径
html_file_path = 'path/to/your/file.html'
在这里,你可以编写另一个脚本来读取文件夹中的所有 .html
文件,并执行相应的操作。
# 读取 HTML 文件的内容
with open(html_file_path, 'r') as file:
html = file.read()
# 查找 HTML 中的所有表格
tables = soup.find_all('table')
# 遍历表格
for table in tables:
# 查找表格的 ID 属性
table_id = table.get('id')
# 提取表格的表头
headers = [th.get_text() for th in table.find('thead').find_all('th')]
# 创建一个字典来存储表格数据
table_data = {}
# 遍历表格的行
for row in table.find('tbody').find_all('tr'):
# 提取行中的单元格
cells = row.find_all('td')
# 提取单元格的值
values = [cell.get_text().strip() for cell in cells]
# 使用表头与值来存储数据到字典中
for header, value in zip(headers, values):
if header not in table_data:
table_data[header] = []
table_data[header].append(value)
# 生成 PostgreSQL 表格脚本
create_table_script = f"CREATE TABLE {table_id} (\n"
for header, values in table_data.items():
# 处理带有空格或特殊字符的列名
column_name = header.lower().replace(' ', '_').replace('.', '_')
# 将列值组合成逗号分隔的字符串
column_values = ', '.join([f"'{value}'" if isinstance(value, str) else str(value) for value in values])
# 将列定义附加到脚本中
create_table_script += f" {column_name} {column_values},\n"
create_table_script = create_table_script.rstrip(',\n') + "\n);\n"
# 打印表格脚本
print(create_table_script)
上述操作的结果可能不会完全符合你的要求,但可以帮助你入门。
英文:
Edit
Tried out some more variants and found that using zip
resulted in no errors and returned the sql script.
> for header, value in zip(headers, values):
I've updated the code bellow as well.
So your approach could constitute something like:
read .html
file
parse for table
parse for theader
and so on
I find idea pretty interesting, so I tried it out in python.
from bs4 import BeautifulSoup
# Specify the path to your HTML file
html_file_path = 'path/to/your/file.html'
here you could write another script to read all .html
files in a folder and execute against that instead.
# Read the contents of the HTML file
with open(html_file_path, 'r') as file:
html = file.read()
# Find all the tables in the HTML
tables = soup.find_all('table')
# Iterate over the tables
for table in tables:
# Find the table's ID attribute
table_id = table.get('id')
# Extract the table headers
headers = [th.get_text() for th in table.find('thead').find_all('th')]
# Create a dictionary to store the table data
table_data = {}
# Iterate over the table rows
for row in table.find('tbody').find_all('tr'):
# Extract the row cells
cells = row.find_all('td')
# Extract the cell values
values = [cell.get_text().strip() for cell in cells]
# Store the values with their corresponding headers in the dictionary
for header, value in zip(headers, values):
if header not in table_data:
table_data[header] = []
table_data[header].append(value)
# Generate the PostgreSQL table script
create_table_script = f"CREATE TABLE {table_id} (\n"
for header, values in table_data.items():
# Handle column names with spaces or special characters
column_name = header.lower().replace(' ', '_').replace('.', '_')
# Combine the column values into a comma-separated string
column_values = ', '.join([f"'{value}'" if isinstance(value, str) else str(value) for value in values])
# Append the column definition to the script
create_table_script += f" {column_name} {column_values},\n"
create_table_script = create_table_script.rstrip(',\n') + "\n);\n"
# Print the table script
print(create_table_script)
The result of the above is probably not going to be exactly what you want, but it gets you started.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论