英文:
Can't get an URL from an a href
问题
I want to scrap this website: https://www.sortlist.fr/search
There are lines of websites that can be clicked, and it opens a page for more details of the website.
I want to get that URL, but I can't seem to find it in the <a href
I tried inspecting the element, searching if it was somewhere in a script I couldn't find it.
I tried looping at the network option from the dev tools, also couldn't manage to find it.
Did anyone get any idea?
By the way, I want to use Selenium for this, but there is no login system. So, is it a good idea, or is there a better way?
英文:
I want to scrap this website: https://www.sortlist.fr/search
There are lines of websites that can be clicked, and it opens a page for more details of the website.
I want to get that URL, but I can't seem to find it in the <a href
I tried inspecting the element, searching if it was somewhere in a script I couldn't find it.
I tried looping at the network option from the dev tools, also couldn't manage to find it.
Did anyone get any idea?
By the way, I want to use Selenium for this, but there is no login system. So, is it a good idea, or is there a better way?
答案1
得分: 1
以下是已翻译的内容:
"agences trouvées" 元素在网页上找到,但不包含 "href" 属性:
<a href="" class="h5 bold text-secondary-900 text-truncate mb-8" data-testid="name-cell">Pursuit Digital</a>
因此,您无法立即从主页面提取 "href" 属性。
解决方案
相反,您可以点击并在相邻标签中打开 "agences trouvées",并使用以下WebDriverWait来打印当前URL,使用visibility_of_all_elements_located()来定位元素:
driver.get("https://www.sortlist.fr/search")
parent_window = driver.current_window_handle
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-testid='name-cell']")))
hrefs = []
for elem in elements:
elem.click()
all_windows = driver.window_handles
new_window = [window for window in all_windows if window != parent_window][0]
driver.switch_to.window(new_window)
print(new_window)
print(driver.current_url)
hrefs.append(driver.current_url)
driver.close()
driver.switch_to.window(parent_window)
print(hrefs)
driver.quit()
控制台输出:
85F8A3B48F9DF45BEB28D7A530E6979E
https://www.sortlist.fr/agency/pursuit-digital
BA4F926FAD46A5EA5F5FC4406861D20D
https://www.sortlist.fr/agency/rozee-digital
84E3A361C4202C594893546BEF39CD47
https://www.sortlist.fr/agency/trends-tokyo
FC27FFCB9CBE26CD908B8865B8C5CEA5
https://www.sortlist.fr/agency/cortlex
64E50C5041A98BECCB17475A80477D60
https://www.sortlist.fr/agency/steinpilz-gmbh
36FF3D6D3C803BF05EEBB676D58E2DE7
https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh
A13B789C8A618AAD5C372219FC5E3E7E
https://www.sortlist.fr/agency/cc-systems
C39AB3659EE6A627044A2A29CC439AFD
https://www.sortlist.fr/agency/snapp-x
2979C1A6C0FEF21B3499B2184907F28B
https://www.sortlist.fr/agency/scrumble
452F8D30237A146724055715E9690288
https://www.sortlist.fr/agency/gaofeng-creative
F05A9B4963C54306ABBB74420481989E
https://www.sortlist.fr/agency/dashdot
FE2B66F925ACCA122B86E597D28B5403
https://www.sortlist.fr/agency/therocketsoft
FBBE3D1535D35C230A5C7496632435DC
https://www.sortlist.fr/agency/run-gun-films
D4C5C162F3C422FB44862563D8AB73DD
https://www.sortlist.fr/agency/studio-unbound
329DA752A15041450FF5DDAA7850C332
https://www.sortlist.fr/agency/contentgo
B35A03AA6947A1EE043E3EE915E219BE
https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa
F77913A1097ACD4DB2B78F4E997B4A0E
https://www.sortlist.fr/agency/yarandin-llc
7A3C75AFF9ED31E5C5E5915A7E9A84EB
https://www.sortlist.fr/agency/fortis-media
C86FCE23AF84B72CFF793A349C005BDD
https://www.sortlist.fr/agency/osenorth
A266A09B3AEDD65E8A43E26DEAECBF22
https://www.sortlist.fr/agency/apps-square
['https://www.sortlist.fr/agency/pursuit-digital', 'https://www.sortlist.fr/agency/rozee-digital', 'https://www.sortlist.fr/agency/trends-tokyo', 'https://www.sortlist.fr/agency/cortlex', 'https://www.sortlist.fr/agency/steinpilz-gmbh', 'https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh', 'https://www.sortlist.fr/agency/cc-systems', 'https://www.sortlist.fr/agency/snapp-x', 'https://www.sortlist.fr/agency/scrumble', 'https://www.sortlist.fr/agency/gaofeng-creative', 'https://www.sortlist.fr/agency/dashdot', 'https://www.sortlist.fr/agency/therocketsoft', 'https://www.sortlist.fr/agency/run-gun-films', 'https://www.sortlist.fr/agency/studio-unbound', 'https://www.sortlist.fr/agency/contentgo', 'https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa', 'https://www.sortlist.fr/agency/yarandin-llc', 'https://www.sortlist.fr/agency/fortis-media', 'https://www.sortlist.fr/agency/osenorth', 'https://www.sortlist.fr/agency/apps-square']
英文:
The agences trouvées elements found on the webpage doesn't contains the href
attribute:
<a href="" class="h5 bold text-secondary-900 text-truncate mb-8" data-testid="name-cell">Pursuit Digital</a>
So you won't be able to extract the href
attributes from the main page straight away.
Solution
Instead you can click and open the agences trouvées in the adjascent tab and print the current URL inducing WebDriverWait for visibility_of_all_elements_located() using the following locator strategy:
-
Code Block:
driver.get("https://www.sortlist.fr/search") parent_window = driver.current_window_handle elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-testid='name-cell']"))) hrefs = [] for elem in elements: elem.click() all_windows = driver.window_handles new_window = [window for window in all_windows if window != parent_window][0] driver.switch_to.window(new_window) print(new_window) print(driver.current_url) hrefs.append(driver.current_url) driver.close() driver.switch_to.window(parent_window) print(hrefs) driver.quit()
-
Console Output:
85F8A3B48F9DF45BEB28D7A530E6979E https://www.sortlist.fr/agency/pursuit-digital BA4F926FAD46A5EA5F5FC4406861D20D https://www.sortlist.fr/agency/rozee-digital 84E3A361C4202C594893546BEF39CD47 https://www.sortlist.fr/agency/trends-tokyo FC27FFCB9CBE26CD908B8865B8C5CEA5 https://www.sortlist.fr/agency/cortlex 64E50C5041A98BECCB17475A80477D60 https://www.sortlist.fr/agency/steinpilz-gmbh 36FF3D6D3C803BF05EEBB676D58E2DE7 https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh A13B789C8A618AAD5C372219FC5E3E7E https://www.sortlist.fr/agency/cc-systems C39AB3659EE6A627044A2A29CC439AFD https://www.sortlist.fr/agency/snapp-x 2979C1A6C0FEF21B3499B2184907F28B https://www.sortlist.fr/agency/scrumble 452F8D30237A146724055715E9690288 https://www.sortlist.fr/agency/gaofeng-creative F05A9B4963C54306ABBB74420481989E https://www.sortlist.fr/agency/dashdot FE2B66F925ACCA122B86E597D28B5403 https://www.sortlist.fr/agency/therocketsoft FBBE3D1535D35C230A5C7496632435DC https://www.sortlist.fr/agency/run-gun-films D4C5C162F3C422FB44862563D8AB73DD https://www.sortlist.fr/agency/studio-unbound 329DA752A15041450FF5DDAA7850C332 https://www.sortlist.fr/agency/contentgo B35A03AA6947A1EE043E3EE915E219BE https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa F77913A1097ACD4DB2B78F4E997B4A0E https://www.sortlist.fr/agency/yarandin-llc 7A3C75AFF9ED31E5C5E5915A7E9A84EB https://www.sortlist.fr/agency/fortis-media C86FCE23AF84B72CFF793A349C005BDD https://www.sortlist.fr/agency/osenorth A266A09B3AEDD65E8A43E26DEAECBF22 https://www.sortlist.fr/agency/apps-square ['https://www.sortlist.fr/agency/pursuit-digital', 'https://www.sortlist.fr/agency/rozee-digital', 'https://www.sortlist.fr/agency/trends-tokyo', 'https://www.sortlist.fr/agency/cortlex', 'https://www.sortlist.fr/agency/steinpilz-gmbh', 'https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh', 'https://www.sortlist.fr/agency/cc-systems', 'https://www.sortlist.fr/agency/snapp-x', 'https://www.sortlist.fr/agency/scrumble', 'https://www.sortlist.fr/agency/gaofeng-creative', 'https://www.sortlist.fr/agency/dashdot', 'https://www.sortlist.fr/agency/therocketsoft', 'https://www.sortlist.fr/agency/run-gun-films', 'https://www.sortlist.fr/agency/studio-unbound', 'https://www.sortlist.fr/agency/contentgo', 'https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa', 'https://www.sortlist.fr/agency/yarandin-llc', 'https://www.sortlist.fr/agency/fortis-media', 'https://www.sortlist.fr/agency/osenorth', 'https://www.sortlist.fr/agency/apps-square']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论