无法使用Selenium CSS选择器找到元素,即使单独使用它正常。

huangapple go评论67阅读模式
英文:

Can't find element using Selenium CSS Selector even though it works fine individually

问题

以下是您要翻译的代码部分:

I am trying to scrape this page: "https://www.semi.org/en/resources/member-directory"

On it's own, the code seems to work fine:
`link = browser.find_element(By.CLASS_NAME, "member-company__title").find_element(By.TAG_NAME, 'a').get_attribute('href')`

This returns my link. However, when I nest the code in a for loop, I get an error that the css selector was unable to find the element. I tried using the X_Path, but that would only access the first container.

This is my code:

results_df = pd.DataFrame({'Company Name': [], 'Join Date': [], 'Company ID': [], 'Company Description': [], 'Link': [], 'Primary Industry': [], 'Primary Product Category': [], 'Primary Sub Product Category': [], 'Keywords': [], 'Address':[]})

browser = webdriver.Chrome()
# Load the desired URL
another_url = "https://www.semi.org/en/resources/member-directory"
browser.get(another_url)
time.sleep(3)

containers = browser.find_elements(By.TAG_NAME, 'tr')
for i in range(len(containers)):
    container = containers[i]
    link = container.find_element(By.TAG_NAME, 'a').get_attribute('href')
    browser.get(link)
    print("Page navigated after click" + browser.title)
    time.sleep(3)
    company_name =  browser.find_element(By.CLASS_NAME, "page-title").text
    try:
        join_date = browser.find_element(By.CLASS_NAME, "member-company__join-date").find_element(By.TAG_NAME, 'span').text
    except NoSuchElementException:
        join_date = "None"
    try:
        c_ID = browser.find_element(By.CLASS_NAME, "member-company__company-id").find_element(By.TAG_NAME, 'span').text
    except NoSuchElementException:
        c_ID = "None"
    try:
        company_description = browser.find_element(By.CLASS_NAME, "member-company__description").text
    except NoSuchElementException:
        company_description = "None" 
    try:
        company_link = browser.find_element(By.CLASS_NAME,"member-company__website").find_element(By.TAG_NAME, 'div').get_attribute('href')
    except NoSuchElementException:
        company_link = "None"
    try:
        primary_industry = browser.find_element(By.CLASS_NAME, "member-company__primary-industry").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        primary_industry = "None"
    try:
        primary_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-category").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        primary_product_cat = "None"
    try:
        primary_sub_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-subcategory").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        primary_sub_product_cat = "None"
    
    try:
        keywords = browser.find_element(By.CLASS_NAME, "member-company__keywords ").find_element(By.TAG_NAME, 'div').text
    except NoSuchElementException:
        keywords = "None"
    try:
        address = browser.find_element(By.CLASS_NAME,"member-company__address").text.replace("Street Address","")
    except NoSuchElementException:
        address = "None"
    browser.get(another_url)

    time.sleep(5)

    result_df = pd.DataFrame({"Company Name": [company_name], 
        "Join Date": [join_date],
        "Company ID": [c_ID],
        "Company Description": [company_description],
        "Company Website": [company_link],
        "Primary Industry": [primary_industry],
        "Primary Product Category": [primary_product_cat],
        "Primary Sub Product Category": [primary_sub_product_cat],
        "Keywords": [keywords],
        "Address":[address]})
    results_df = pd.concat([results_df, result_df])
    results_df.reset_index(drop=True, inplace=True)
    results_df.to_csv('semi_test', index=False)

browser.close()

希望这有帮助。如果您需要任何其他帮助,请随时提问。

英文:

I am trying to scrape this page: "https://www.semi.org/en/resources/member-directory"

On it's own, the code seems to work fine:
link = browser.find_element(By.CLASS_NAME, "member-company__title").find_element(By.TAG_NAME, 'a').get_attribute('href')

This returns my link. However, when I nest the code in a for loop, I get an error that the css selector was unable to find the element. I tried using the X_Path, but that would only access the first container.

This is my code:

results_df = pd.DataFrame({'Company Name': [], 'Join Date': [], 'Company ID': [],'Company Description': [], 'Link': [], 'Primary Industry': [], 
'Primary Product Category': [], 'Primary Sub Product Category': [], 'Keywords': [], 'Address':[]})
browser = webdriver.Chrome()
# Load the desired URL
another_url = "https://www.semi.org/en/resources/member-directory"
browser.get(another_url)
time.sleep(3)
containers = browser.find_elements(By.TAG_NAME, 'tr')
for i in range(len(containers)):
container = containers[i]
link = container.find_element(By.TAG_NAME, 'a').get_attribute('href')
browser.get(link)
print("Page navigated after click" + browser.title)
time.sleep(3)
company_name =  browser.find_element(By.CLASS_NAME, "page-title").text
try:
join_date = browser.find_element(By.CLASS_NAME, "member-company__join-date").find_element(By.TAG_NAME, 'span').text
except NoSuchElementException:
join_date = "None"
try:
c_ID = browser.find_element(By.CLASS_NAME, "member-company__company-id").find_element(By.TAG_NAME, 'span').text
except NoSuchElementException:
c_ID = "None"
try:
company_description = browser.find_element(By.CLASS_NAME, "member-company__description").text
except NoSuchElementException:
company_description = "None" 
try:
company_link = browser.find_element(By.CLASS_NAME,"member-company__website").find_element(By.TAG_NAME, 'div').get_attribute('href')
except NoSuchElementException:
company_link = "None"
try:
primary_industry = browser.find_element(By.CLASS_NAME, "member-company__primary-industry").find_element(By.TAG_NAME, 'div').text
except NoSuchElementException:
primary_industry = "None"
try:
primary_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-category").find_element(By.TAG_NAME, 'div').text
except NoSuchElementException:
primary_product_cat = "None"
try:
primary_sub_product_cat = browser.find_element(By.CLASS_NAME, "member-company__primary-product-subcategory").find_element(By.TAG_NAME, 'div').text
except NoSuchElementException:
primary_sub_product_cat = "None"
try:
keywords = browser.find_element(By.CLASS_NAME, "member-company__keywords ").find_element(By.TAG_NAME, 'div').text
except NoSuchElementException:
keywords = "None"
try:
address = browser.find_element(By.CLASS_NAME,"member-company__address").text.replace("Street Address","")
except NoSuchElementException:
address = "None"
browser.get(another_url)
time.sleep(5)
result_df = pd.DataFrame({"Company Name": [company_name], 
"Join Date": [join_date],
"Company ID": [c_ID],
"Company Description": [company_description],
"Company Website": [company_link],
"Primary Industry": [primary_industry],
"Primary Product Category": [primary_product_cat],
"Primary Sub Product Category": [primary_sub_product_cat],
"Keywords": [keywords],
"Address":[address]})
results_df = pd.concat([results_df, result_df])
results_df.reset_index(drop=True, inplace=True)
results_df.to_csv('semi_test', index=False)
browser.close()

What's going on?

`

答案1

得分: 0

这主要是由于语句containers = browser.find_elements(By.TAG_NAME, 'tr')。如果你打印出这些容器,你会注意到第一行被选中的是包含没有链接的标题行,因此你的脚本会失败,并抛出你所看到的异常。你可以通过containers = containers[1:]来解决这个问题,但随后你会面临StaleElementReferenceException的问题,因为在打开另一个链接后,你又回到了首页。你应该一次性从页面上获取所有的链接,然后遍历它们来分别抓取,而不是一遍又一遍地返回到首页。

英文:

This is mainly due to the statement containers = browser.find_elements(By.TAG_NAME, 'tr').
If you print out the containers, you'll notice that the first row selected is the header which contains no links and your script will fail giving the exception that you're seeing.
You can fix this with containers = containers[1:] but you'll then face the problem of StaleElementReferenceException because you've come back to the home page after opening another link.
You should scrape all the links from the page at once, and then iterate over those to scrape each of them, rather than coming back to the home page over and over again.

huangapple
  • 本文由 发表于 2023年2月8日 23:21:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387943.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定