使用Selenium进行数据抓取

huangapple go评论58阅读模式
英文:

Data Scrapping using selenium

问题

我正在尝试网页抓取这个网站,但它只返回第一个项目。

这里有什么问题吗?我似乎无法找出问题所在。

没有收到错误值。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()

url = 'https://www.bizbuysell.com/new-jersey-businesses-for-sale/?q=Y2Zmcm9tPTIwMDAwMCZwdG89NjAwMDAw'
driver.get(url)

try:
    # 等待页面加载
    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="2107093"]/div/div[1]/div[2]/h3')))

    # 查找并提取标题
    all_titles = []
    titles = driver.find_elements(By.XPATH, '//*[@id="2107093"]/div/div[1]/div[2]/h3')
    for title in titles:
        all_titles.append(title.text)

    print(all_titles)

except TimeoutException:
    print("等待元素时发生超时。")
finally:
    driver.quit()
英文:

I am trying to webscrape the website, but it only returns the frist item.

I there something wrong here, i cant seems to figure it out.

No error value gotten.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()


url = 'https://www.bizbuysell.com/new-jersey-businesses-for-sale/?q=Y2Zmcm9tPTIwMDAwMCZwdG89NjAwMDAw'
driver.get(url)

try:
    # Wait for the page to load
    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="2107093"]/div/div[1]/div[2]/h3')))

    # Find and extract the titles
    all_titles = []
    titles = driver.find_elements(By.XPATH, '//*[@id="2107093"]/div/div[1]/div[2]/h3')
    for title in titles:
        all_titles.append(title.text)

    print(all_titles)

except TimeoutException:
    print("Timeout occurred while waiting for element.")
finally:
    driver.quit()

答案1

得分: 0

以下是您提供的Python代码的翻译部分:

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = Chrome()
wait = WebDriverWait(driver, 30)
url = 'https://www.bizbuysell.com/new-jersey-businesses-for-sale/?q=Y2Zmcm9tPTIwMDAwMCZwdG89NjAwMDAw'
driver.get(url)

titles = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h3.title.ng-star-inserted')))
all_titles = [title.text for title in titles]
print(all_titles)

输出:

['MFR of proprietary and patented recycled tire crumb rubber products', 'Established Landscaping Company for sale in Morris County', 'Restaurant w/ Liquor License - 10 Year Lease', 'Woodbridge NJ Terminal-FEDEX 2 Team Linehaul FedEx Routes For Sale', 'Former Dunkin Value-Add Retail Near I-80', 'High Traffic Pharmacy For Sale in New Jersey', 'Established Liquor Store with Liquor License', 'Established Pizzeria with End Cap location in Large Shopping Center', 'Price Reduced! Online Hookah Retailer- Excellent Financial Records', 'Very Successful Catering -10% Down-1st Qtr +41.8%', 'Pop-A-Lock', 'Established, Profitable Hot Yoga Studio', '20-Year Premier Jewelry Store & Gold Buying', 'Established beauty salon and spa', 'Well established construction business for sale. $542,000', 'Hotel Business Between Manhattan & the American Dream - UNDER CONTRACT', 'Regionally Famous Specialty Italian Deli & Market', 'Landscape Business- Over $150K Income!', 'Landscape - Design - Build - Maintenance $200k Cash Flow', 'Retail/Commercial Bakery with National Brand Recognition', 'Successful Deli and Convenience Store', 'Top Franchises', 'Cash Flow-GREAT DEAL!', 'Established Italian Restaurant', 'Fedex Line Haul net $234k Ask $599k', 'Cherry Hill - Niche, High Profit, Repair/Home Services', 'Growing Third Party Logistics / Fulfillment Company', 'Masonry Co. Nets $250,000', 'Profitable Turn Key Full Bar & Restaurant On The Hudson River', 'Fast-Casual American Marquee Brand Drive-in Restaurant', 'Pizzeria', 'Healthy Food Restaurant - SBA Appraised for over 700k', 'Island Fin Poke', 'SBA Approved Pizza Shop - Union County', 'Branded Gas Station with Repair Shop and C-Store', 'Ocean County NJ Pizzeria and Italian Restaurant for Sale Earning Over', 'Elizabeth - Home-Based Home Improvement & Specialty Restoration', 'Princeton - Highly Profitable Flooring Sales & Consulting Business', 'D1 Training', '20-Year Established Dry Cleaning Business', 'Specialty High-End Seafood and Catering Restaurant', 'Nearly Fully Enrolled Passaic County Childcare', 'Generac Generator Sales and Service Provider- Seller Retiring', 'Highly Profitable Car & Truck Repair Services', 'Pet Services Company - Large Growth Potential', 'Great Pizzeria Business Opportunity With Provable Records For Sale', 'Floor Coatings Business, Specialists - Great Accounts!', 'Pet Boarding and Grooming Business with Property', 'Busy Bagel Store For Sale', 'Fresh Coat Painters', 'Amoroso Bread Route - With Employee and Truck', 'Profitable Union County Diner', 'Most Profitable Grocery in area - Thriving Italian Provisions Shop', '30 yr old NJ Pool Service Company - Discounted Asking Price!', 'Organic Dry Cleaner', 'TemperaturePro Heating, Air Conditioning Services and Indoor Air Quality Services', 'Linden , NJ beauty supply,5,000 sf with $500K Plus Inv Best store']
英文:

You may try this way, it works

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = Chrome()
wait = WebDriverWait(driver, 30)
url = 'https://www.bizbuysell.com/new-jersey-businesses-for-sale/?q=Y2Zmcm9tPTIwMDAwMCZwdG89NjAwMDAw'
driver.get(url)

titles = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h3.title.ng-star-inserted')))
all_titles = [title.text for title in titles]
print(all_titles)

output:

['MFR of proprietary and patented recycled tire crumb rubber products', 'Established Landscaping Company for sale in Morris County', 'Restaurant w/ Liquor License - 10 Year Lease', 'Woodbridge NJ Terminal-FEDEX 2 Team Linehaul FedEx Routes For Sale', 'Former Dunkin Value-Add Retail Near I-80', 'High Traffic Pharmacy For Sale in New Jersey', 'Established Liquor Store with Liquor License', 'Established Pizzeria with End Cap location in Large Shopping Center', 'Price Reduced! Online Hookah Retailer- Excellent Financial Records', 'Very Successful Catering -10% Down-1st Qtr +41.8%', 'Pop-A-Lock', 'Established, Profitable Hot Yoga Studio', '20-Year Premier Jewelry Store & Gold Buying', 'Established beauty salon and spa', 'Well established construction business for sale. $542,000', 'Hotel Business Between Manhattan & the American Dream - UNDER CONTRACT', 'Regionally Famous Specialty Italian Deli & Market', 'Landscape Business- Over $150K Income!', 'Landscape - Design - Build - Maintenance $200k Cash Flow', 'Retail/Commercial Bakery with National Brand Recognition', 'Successful Deli and Convenience Store', 'Top Franchises', 'Cash Flow-GREAT DEAL!', 'Established Italian Restaurant', 'Fedex Line Haul net $234k Ask $599k', 'Cherry Hill - Niche, High Profit, Repair/Home Services', 'Growing Third Party Logistics / Fulfillment Company', 'Masonry Co. Nets $250,000', 'Profitable Turn Key Full Bar & Restaurant On The Hudson River', 'Fast-Casual American Marquee Brand Drive-in Restaurant', 'Pizzeria', 'Healthy Food Restaurant - SBA Appraised for over 700k', 'Island Fin Poke', 'SBA Approved Pizza Shop - Union County', 'Branded Gas Station with Repair Shop and C-Store', 'Ocean County NJ Pizzeria and Italian Restaurant for Sale Earning Over', 'Elizabeth - Home-Based Home Improvement & Specialty Restoration', 'Princeton - Highly Profitable Flooring Sales & Consulting Business', 'D1 Training', '20-Year Established Dry Cleaning Business', 'Specialty High-End Seafood and Catering Restaurant', 'Nearly Fully Enrolled Passaic County Childcare', 'Generac Generator Sales and Service Provider- Seller Retiring', 'Highly Profitable Car & Truck Repair Services', 'Pet Services Company - Large Growth Potential', 'Great Pizzeria Business Opportunity With Provable Records For Sale', 'Floor Coatings Business, Specialists - Great Accounts!', 'Pet Boarding and Grooming Business with Property', 'Busy Bagel Store For Sale', 'Fresh Coat Painters', 'Amoroso Bread Route - With Employee and Truck', 'Profitable Union County Diner', 'Most Profitable Grocery in area - Thriving Italian Provisions Shop', '30 yr old NJ Pool Service Company - Discounted Asking Price!', 'Organic Dry Cleaner', 'TemperaturePro Heating, Air Conditioning Services and Indoor Air Quality Services', 'Linden , NJ beauty supply,5,000 sf with $500K Plus Inv Best store']

huangapple
  • 本文由 发表于 2023年5月25日 23:42:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334090.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定