英文:
Python Scraping empty tag
问题
我遇到了从页面中提取某些元素的问题:
https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i
代码:
import requests
from bs4 import BeautifulSoup
URL = "https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(class_="product_cart_title").text
price = soup.find(class_="icon_main_block_price_a")
number = soup.find(class_="product_cart_info").findAll('tr')[1].findAll('td')[1]
description = soup.find(id="tab_a")
print(description)
问题是当我想要获取tab_a
时出现问题,
而在
<div align="left" class="product_cart_info" id="charlong_id">
</div>
中是空的。我该如何获取它?
我认为这可能与JavaScript有关。也许在页面加载时存在一些延迟?
英文:
I have a problem with scraping some element from a page:
https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i
code:
import requests
from bs4 import BeautifulSoup
URL="https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
title=soup.find(class_="product_cart_title").text
price=soup.find(class_="icon_main_block_price_a")
number=soup.find(class_="product_cart_info").findAll('tr')[1].findAll('td')[1]
description=soup.find(id="tab_a")
print(description)
Problem is when I want to get to: tab_a
And its a problem cause inside
<div align="left" class="product_cart_info" id="charlong_id">
</div>
is empty. How I can get it?
I see its about js i think. Maybe there is some delay when the page loads?
答案1
得分: 2
如评论中所述,信息是通过JavaScript加载的,因此BeautifulSoup无法看到它。但是,如果您查看Chrome/Firefox网络选项卡,您可以看到页面发出请求的位置:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i'
ajax_url = 'https://tuning-tec.com/_template/_show_normal/_show_charlong.php?itemId={}'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('.product_cart_title').get_text(strip=True))
print(soup.select_one('.icon_main_block_price_a').get_text(strip=True))
print(soup.select_one('td:contains("Symbol") ~ td').get_text(strip=True))
item_id = re.findall(r"ajax_update_stat\('(\d+)'\)", soup.text)[0]
soup2 = BeautifulSoup(requests.get(ajax_url.format(item_id)).content, 'html.parser')
print()
# just print some info:
for tr in soup2.select('tr'):
print(re.sub(r' {2,}', ' ', tr.select_one('td').get_text(strip=True, separator=' ')))
输出:
MERCEDES W164 ML M-KLASA 05-07 BLACK LED SEQ
1788.62 PLN
LPMED0
PL
Opis
Lampy soczewkowe ze światłem pozycyjnym LED. Z dynamicznym kierunkowskazem. 100% nowe, w komplecie (lewa i prawa). Homologacja: norma E13 - dopuszczone do ruchu.
Szczegóły
Światła pozycyjne: DIODY Kierunkowskaz: DIODY Światła mijania: H9 w zestawie Światła drogowe: H1 w zestawie Regulacja: elektryczna (silniczek znajduje się w komplecie).
LED TUBE LIGHT Dynamic Turn Signal >>
英文:
As stated in the comments, the info is loaded via JavaScript, so BeautifulSoup doesn't see it. But you if you look to Chrome/Firefox network tab, you can see where the page is making requests:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i'
ajax_url = 'https://tuning-tec.com/_template/_show_normal/_show_charlong.php?itemId={}'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('.product_cart_title').get_text(strip=True))
print(soup.select_one('.icon_main_block_price_a').get_text(strip=True))
print(soup.select_one('td:contains("Symbol") ~ td').get_text(strip=True))
item_id = re.findall(r"ajax_update_stat\('(\d+)'\)", soup.text)[0]
soup2 = BeautifulSoup(requests.get(ajax_url.format(item_id)).content, 'html.parser')
print()
# just print some info:
for tr in soup2.select('tr'):
print(re.sub(r' {2,}', ' ', tr.select_one('td').get_text(strip=True, separator=' ')))
Prints:
MERCEDES W164 ML M-KLASA 05-07 BLACK LED SEQ
1788.62 PLN
LPMED0
PL
Opis
Lampy
soczewkowe ze światłem
pozycyjnym LED. Z dynamicznym
kierunkowskazem. 100% nowe, w komplecie
(lewa i prawa). Homologacja: norma E13 -
dopuszczone do ruchu.
Szczegóły
Światła pozycyjne: DIODY Kierunkowskaz: DIODY Światła
mijania: H9 w
zestawie Światła
drogowe: H1 w
zestawie Regulacja: elektryczna (silniczek znajduje się w
komplecie).
LED TUBE LIGHT Dynamic Turn Signal >>
答案2
得分: 0
A little change in the description, I don't know if it's working, have a look at the following code:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i'
ajax_url = 'https://tuning-tec.com/_template/_show_normal/_show_charlong.php?itemId={}'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
def unwrapElements(soup, elementsToFind):
elements = soup.find_all(elementsToFind)
for element in elements:
element.unwrap()
print(soup.select_one('.product_cart_title').get_text(strip=True))
print(soup.select_one('.icon_main_block_price_a').get_text(strip=True))
print(soup.select_one('td:contains("Symbol") ~ td').get_text(strip=True))
item_id = re.findall(r"ajax_update_stat\('(\d+)'\)", soup.text)[0]
soup2 = BeautifulSoup(requests.get(ajax_url.format(item_id)).content, 'html.parser')
description=soup2.findAll('tr')[2].findAll('td')[1]
description.append(soup2.findAll('tr')[4].findAll('td')[1])
unwrapElements(description, "td")
unwrapElements(description, "font")
unwrapElements(description, "span")
print(description)
I need just these elements of description in the English language. It will be OK?
And anyway, thanks for the help!!
Only one thing, I don't know why it didn't remove all <td>.
<details>
<summary>英文:</summary>
A little change in the description, I don't know if it's working, have a look on the following code:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://tuning-tec.com/mercedes_w164_ml_mklasa_0507_black_led_seq_lpmed0-5789i'
ajax_url = 'https://tuning-tec.com/_template/_show_normal/_show_charlong.php?itemId={}'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
def unwrapElements(soup, elementsToFind):
elements = soup.find_all(elementsToFind)
for element in elements:
element.unwrap()
print(soup.select_one('.product_cart_title').get_text(strip=True))
print(soup.select_one('.icon_main_block_price_a').get_text(strip=True))
print(soup.select_one('td:contains("Symbol") ~ td').get_text(strip=True))
item_id = re.findall(r"ajax_update_stat\('(\d+)'\)", soup.text)[0]
soup2 = BeautifulSoup(requests.get(ajax_url.format(item_id)).content, 'html.parser')
description=soup2.findAll('tr')[2].findAll('td')[1]
description.append(soup2.findAll('tr')[4].findAll('td')[1])
unwrapElements(description, "td")
unwrapElements(description, "font")
unwrapElements(description, "span")
print(description)
I need just these elements of description in English language. It will be OK?
And anyway thanks for help !!
Only one thing i don't know why he didn't remove all <td>
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论