英文:
How to scrap dynamic HTML table with differencet class name for each row containing nested elements?
问题
我想通过抓取此处的表格创建一个数据框,该表格对于每一行都有不同的class
名称,并包含嵌套元素。
table_rows = driver.find_elements(By.CLASS_NAME, "bgColor-white")
for _, val in enumerate(table_rows):
print(val.text)
上述代码的print
输出是字符串,但无法分隔成适当的列。
英文:
I want to create a dataframe by scrapping the table here which has different class
name for each row and contains nested elements.
table_rows = driver.find_elements(By.CLASS_NAME, "bgColor-white")
for _, val in enumerate(table_rows):
print(val.text)
Print
output of the above code is string but could not segregate into appropriate columns.
答案1
得分: 1
识别表格元素,然后获取表格元素的 outerHTML
。
使用 pandas 的 read_html()
方法获取数据框。
driver.get("https://www.egp.gov.bt/resources/common/TenderListing.jsp?lang=en_US&langForMenu=en_US&h=t")
time.sleep(3)
table = driver.find_element(By.CSS_SELECTOR, "table#resultTable").get_attribute("outerHTML")
df = pd.read_html(table)[0]
print(df)
控制台输出:
Sl. No. 招标ID,参考号,公共状态 ... 类型,方法 发布日期和时间 | 截止日期和时间
0 1 15183, TSHA-6/Engineering/9/2022-2023/769, Live ... NCB, OTM 03-Mar-2023 15:00 | 14-Mar-2023 15:10
1 2 15180, STCB/PD/TS/Samtse/2023/213, Live ... NCB, OTM 03-Mar-2023 10:00 | 14-Mar-2023 11:10
2 3 15160, JNEC/Adm-33/2022-2023, Cancelled ... NCB, OTM 02-Mar-2023 22:00 | 10-Mar-2023 10:30
3 4 15179, DAG/DEHSS(07)/2022-2023/148, Live ... NCB, OTM 02-Mar-2023 15:00 | 16-Mar-2023 09:00
4 5 15181, DCHS/PRP-01/2022-2023/244, Amendment/Co... ... NCB, OTM 02-Mar-2023 09:00 | 13-Mar-2023 10:30
5 6 15174, NBC/Adm/06/2022/1198, Live ... NCB, OTM 01-Mar-2023 09:00 | 20-Mar-2023 11:30
6 7 15161, PDA/adm -35/2022-2023/, Live ... NCB, OTM 27-Feb-2023 16:00 | 10-Mar-2023 11:00
7 8 15169, MD/Dz.EHSS-20/2022-2023/5179, Amendmen... ... NCB, OTM 27-Feb-2023 14:30 | 10-Mar-2023 14:00
8 9 15157, nofp2, Live ... NCB, OTM 21-Feb-2023 09:00 | 08-Mar-2023 11:30
9 10 15158, MD/DES-20/2022-2023/5095, Being processed ... NCB, OTM 21-Feb-2023 02:00 | 02-Mar-2023 10:00
[10 行 x 6 列]
英文:
Identify the table element and then get the outerHTML
of the table element.
Use pandas read_html()
method and get the dataframe
.
driver.get ("https://www.egp.gov.bt/resources/common/TenderListing.jsp?lang=en_US&langForMenu=en_US&h=t")
time.sleep(3)
table= driver.find_element(By.CSS_SELECTOR, "table#resultTable").get_attribute("outerHTML")
df=pd.read_html(table)[0]
print(df)
console output:
Sl. No. Tender ID, Reference No, Public Status ... Type, Method Publishing Date & Time | Closing Date & Time
0 1 15183, TSHA-6/Engineering/9/2022-2023/769, Live ... NCB, OTM 03-Mar-2023 15:00 | 14-Mar-2023 15:10
1 2 15180, STCB/PD/TS/Samtse/2023/213, Live ... NCB, OTM 03-Mar-2023 10:00 | 14-Mar-2023 11:10
2 3 15160, JNEC/Adm-33/2022-2023, Cancelled ... NCB, OTM 02-Mar-2023 22:00 | 10-Mar-2023 10:30
3 4 15179, DAG/DEHSS(07)/2022-2023/148, Live ... NCB, OTM 02-Mar-2023 15:00 | 16-Mar-2023 09:00
4 5 15181, DCHS/PRP-01/2022-2023/244, Amendment/Co... ... NCB, OTM 02-Mar-2023 09:00 | 13-Mar-2023 10:30
5 6 15174, NBC/Adm/06/2022/1198, Live ... NCB, OTM 01-Mar-2023 09:00 | 20-Mar-2023 11:30
6 7 15161, PDA/adm -35/2022-2023/, Live ... NCB, OTM 27-Feb-2023 16:00 | 10-Mar-2023 11:00
7 8 15169, MD/Dz.EHSS-20/2022-2023/5179, Amendmen... ... NCB, OTM 27-Feb-2023 14:30 | 10-Mar-2023 14:00
8 9 15157, nofp2, Live ... NCB, OTM 21-Feb-2023 09:00 | 08-Mar-2023 11:30
9 10 15158, MD/DES-20/2022-2023/5095, Being processed ... NCB, OTM 21-Feb-2023 02:00 | 02-Mar-2023 10:00
[10 rows x 6 columns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论