英文:
How to scrape <p> from html under a certain div class
问题
只返回翻译好的部分:
# 从HTML中提取4、5、6(描述、用途和来源),这些内容被标记为'p',在'div'下应用于不同的元素,格式化url = f'....{element}'。
print('当前路径为:', currentPath)
content_list = []
url = 'https://pubchem.ncbi.nlm.gov/element/Antimony'
res = requests.post(url)
# print(res.text)
soup = bs(res.text, 'lxml')
content = soup.find_all('div', class_="section-content-item")
for p in content:
p = soup.find('p')
content_list.append(p)
print(content_list)
英文:
I'd like to scrap 4, 5, 6 (description, uses and sources) from HTML which is tagged as 'p' under a 'div' and apply this for different elements by formatting url = f'....{element}
.
print('Current path is:', currentPath)
content_list = []
url = 'https://pubchem.ncbi.nlm.nih.gov/element/Antimony'
res = requests.post(url)
# print(res.text)
soup = bs(res.text, 'lxml')
content = soup.find_all('div', class_="section-content-item")
for p in content:
p = soup.find('p')
content_list.append(p)
print(content_list)
答案1
得分: 0
始终首先查看您的汤,看看是否所有预期的成分都已到位。
英文:
Always and first of all, take a look at your soup to see if all the expected ingredients are in place.
Content of website is genreated dynamically and and comes from an api, so you won't get it with BeautifulSoup
because it is not in response.
You have to request the api to get your goal - Check the XHR tab of your browsers dev tools in network section.
Example
Just to point in a direction, simply iterate the results and pick the information, to convert in format that fit your needs.
import requests
atomic_numbers = ['51']
sections = ['Description','Uses','Sources']
for e in atomic_numbers:
section_data = requests.get(f'https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/element/{e}/JSON/').json()['Record']['Section']
for s in section_data:
if s['TOCHeading'] in sections:
print(s['Information'])
Output
[{'ReferenceNumber': 6, 'Value': {'StringWithMarkup': [{'String': 'Antimony is a poor conductor of heat and electricity. Antimony and many of its compounds are toxic.'}]}}]
[{'ReferenceNumber': 5, 'Value': {'StringWithMarkup': [{'String': "Antimony is a brittle metal and is a poor conductor of heat and electricity. Very pure antimony is used to make certain types of semiconductor devices, such as diodes and infrared detectors. Antimony is alloyed with lead to increase lead's durability. Antimony alloys are also used in batteries, low friction metals, type metal and cable sheathing, among other products. Antimony compounds are used to make flame-proofing materials, paints, ceramic enamels, glass and pottery. The ancient Egyptians used antimony, in the form of stibnite, for black eye make-up."}]}}, {'ReferenceNumber': 6, 'Value': {'StringWithMarkup': [{'String': 'Antimony is finding use in semiconductor technology for making infrared detectors, diodes and Hall-effect devices. It greatly increases the hardness and mechanical strength of lead. Batteries, antifriction alloys, type metal, small arms and tracer bullets, cable sheathing, and minor products use about half the metal produced. Compounds taking up the other half are oxides, sulfides, sodium antimonate, and antimony trichloride. These are used in manufacturing flame-proofing compounds, paints ceramic enamels, glass, and pottery.'}]}}]
[{'ReferenceNumber': 6, 'Value': {'StringWithMarkup': [{'String': 'Antimony is not abundant, but is found in over 100 mineral species. It is sometimes found natively, but more frequently it is found as the sulfide stibnite.'}]}}]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论