无法从href中获取URL。

huangapple go评论91阅读模式
英文:

Can't get an URL from an a href

问题

I want to scrap this website: https://www.sortlist.fr/search

There are lines of websites that can be clicked, and it opens a page for more details of the website.
I want to get that URL, but I can't seem to find it in the <a href

I tried inspecting the element, searching if it was somewhere in a script I couldn't find it.
I tried looping at the network option from the dev tools, also couldn't manage to find it.

Did anyone get any idea?

By the way, I want to use Selenium for this, but there is no login system. So, is it a good idea, or is there a better way?

英文:

I want to scrap this website: https://www.sortlist.fr/search

There are lines of websites that can be clicked, and it opens a page for more details of the website.
I want to get that URL, but I can't seem to find it in the <a href

I tried inspecting the element, searching if it was somewhere in a script I couldn't find it.
I tried looping at the network option from the dev tools, also couldn't manage to find it.

Did anyone get any idea?

By the way, I want to use Selenium for this, but there is no login system. So, is it a good idea, or is there a better way?

答案1

得分: 1

以下是已翻译的内容:

"agences trouvées" 元素在网页上找到,但不包含 "href" 属性:

<a href="" class="h5 bold text-secondary-900 text-truncate mb-8" data-testid="name-cell">Pursuit Digital</a>

因此,您无法立即从主页面提取 "href" 属性。

解决方案

相反,您可以点击并在相邻标签中打开 "agences trouvées",并使用以下WebDriverWait来打印当前URL,使用visibility_of_all_elements_located()来定位元素:

driver.get("https://www.sortlist.fr/search")
parent_window = driver.current_window_handle
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-testid='name-cell']")))
hrefs = []
for elem in elements:
    elem.click()
    all_windows = driver.window_handles
    new_window = [window for window in all_windows if window != parent_window][0]
    driver.switch_to.window(new_window)
    print(new_window)
    print(driver.current_url)
    hrefs.append(driver.current_url)
    driver.close()
    driver.switch_to.window(parent_window)
print(hrefs)
driver.quit()

控制台输出:

85F8A3B48F9DF45BEB28D7A530E6979E
https://www.sortlist.fr/agency/pursuit-digital
BA4F926FAD46A5EA5F5FC4406861D20D
https://www.sortlist.fr/agency/rozee-digital
84E3A361C4202C594893546BEF39CD47
https://www.sortlist.fr/agency/trends-tokyo
FC27FFCB9CBE26CD908B8865B8C5CEA5
https://www.sortlist.fr/agency/cortlex
64E50C5041A98BECCB17475A80477D60
https://www.sortlist.fr/agency/steinpilz-gmbh
36FF3D6D3C803BF05EEBB676D58E2DE7
https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh
A13B789C8A618AAD5C372219FC5E3E7E
https://www.sortlist.fr/agency/cc-systems
C39AB3659EE6A627044A2A29CC439AFD
https://www.sortlist.fr/agency/snapp-x
2979C1A6C0FEF21B3499B2184907F28B
https://www.sortlist.fr/agency/scrumble
452F8D30237A146724055715E9690288
https://www.sortlist.fr/agency/gaofeng-creative
F05A9B4963C54306ABBB74420481989E
https://www.sortlist.fr/agency/dashdot
FE2B66F925ACCA122B86E597D28B5403
https://www.sortlist.fr/agency/therocketsoft
FBBE3D1535D35C230A5C7496632435DC
https://www.sortlist.fr/agency/run-gun-films
D4C5C162F3C422FB44862563D8AB73DD
https://www.sortlist.fr/agency/studio-unbound
329DA752A15041450FF5DDAA7850C332
https://www.sortlist.fr/agency/contentgo
B35A03AA6947A1EE043E3EE915E219BE
https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa
F77913A1097ACD4DB2B78F4E997B4A0E
https://www.sortlist.fr/agency/yarandin-llc
7A3C75AFF9ED31E5C5E5915A7E9A84EB
https://www.sortlist.fr/agency/fortis-media
C86FCE23AF84B72CFF793A349C005BDD
https://www.sortlist.fr/agency/osenorth
A266A09B3AEDD65E8A43E26DEAECBF22
https://www.sortlist.fr/agency/apps-square
['https://www.sortlist.fr/agency/pursuit-digital', 'https://www.sortlist.fr/agency/rozee-digital', 'https://www.sortlist.fr/agency/trends-tokyo', 'https://www.sortlist.fr/agency/cortlex', 'https://www.sortlist.fr/agency/steinpilz-gmbh', 'https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh', 'https://www.sortlist.fr/agency/cc-systems', 'https://www.sortlist.fr/agency/snapp-x', 'https://www.sortlist.fr/agency/scrumble', 'https://www.sortlist.fr/agency/gaofeng-creative', 'https://www.sortlist.fr/agency/dashdot', 'https://www.sortlist.fr/agency/therocketsoft', 'https://www.sortlist.fr/agency/run-gun-films', 'https://www.sortlist.fr/agency/studio-unbound', 'https://www.sortlist.fr/agency/contentgo', 'https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa', 'https://www.sortlist.fr/agency/yarandin-llc', 'https://www.sortlist.fr/agency/fortis-media', 'https://www.sortlist.fr/agency/osenorth', 'https://www.sortlist.fr/agency/apps-square']
英文:

The agences trouvées elements found on the webpage doesn't contains the href attribute:

&lt;a href=&quot;&quot; class=&quot;h5 bold text-secondary-900 text-truncate mb-8&quot; data-testid=&quot;name-cell&quot;&gt;Pursuit Digital&lt;/a&gt;

So you won't be able to extract the href attributes from the main page straight away.


Solution

Instead you can click and open the agences trouvées in the adjascent tab and print the current URL inducing WebDriverWait for visibility_of_all_elements_located() using the following locator strategy:

  • Code Block:

    driver.get(&quot;https://www.sortlist.fr/search&quot;)
    parent_window = driver.current_window_handle
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, &quot;a[data-testid=&#39;name-cell&#39;]&quot;)))
    hrefs = []
    for elem in elements:
    	elem.click()
    	all_windows = driver.window_handles
    	new_window = [window for window in all_windows if window != parent_window][0]
    	driver.switch_to.window(new_window)
    	print(new_window)
    	print(driver.current_url)
    	hrefs.append(driver.current_url)
    	driver.close()
    	driver.switch_to.window(parent_window)
    print(hrefs)
    driver.quit()
    
  • Console Output:

    85F8A3B48F9DF45BEB28D7A530E6979E
    https://www.sortlist.fr/agency/pursuit-digital
    BA4F926FAD46A5EA5F5FC4406861D20D
    https://www.sortlist.fr/agency/rozee-digital
    84E3A361C4202C594893546BEF39CD47
    https://www.sortlist.fr/agency/trends-tokyo
    FC27FFCB9CBE26CD908B8865B8C5CEA5
    https://www.sortlist.fr/agency/cortlex
    64E50C5041A98BECCB17475A80477D60
    https://www.sortlist.fr/agency/steinpilz-gmbh
    36FF3D6D3C803BF05EEBB676D58E2DE7
    https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh
    A13B789C8A618AAD5C372219FC5E3E7E
    https://www.sortlist.fr/agency/cc-systems
    C39AB3659EE6A627044A2A29CC439AFD
    https://www.sortlist.fr/agency/snapp-x
    2979C1A6C0FEF21B3499B2184907F28B
    https://www.sortlist.fr/agency/scrumble
    452F8D30237A146724055715E9690288
    https://www.sortlist.fr/agency/gaofeng-creative
    F05A9B4963C54306ABBB74420481989E
    https://www.sortlist.fr/agency/dashdot
    FE2B66F925ACCA122B86E597D28B5403
    https://www.sortlist.fr/agency/therocketsoft
    FBBE3D1535D35C230A5C7496632435DC
    https://www.sortlist.fr/agency/run-gun-films
    D4C5C162F3C422FB44862563D8AB73DD
    https://www.sortlist.fr/agency/studio-unbound
    329DA752A15041450FF5DDAA7850C332
    https://www.sortlist.fr/agency/contentgo
    B35A03AA6947A1EE043E3EE915E219BE
    https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa
    F77913A1097ACD4DB2B78F4E997B4A0E
    https://www.sortlist.fr/agency/yarandin-llc
    7A3C75AFF9ED31E5C5E5915A7E9A84EB
    https://www.sortlist.fr/agency/fortis-media
    C86FCE23AF84B72CFF793A349C005BDD
    https://www.sortlist.fr/agency/osenorth
    A266A09B3AEDD65E8A43E26DEAECBF22
    https://www.sortlist.fr/agency/apps-square
    [&#39;https://www.sortlist.fr/agency/pursuit-digital&#39;, &#39;https://www.sortlist.fr/agency/rozee-digital&#39;, &#39;https://www.sortlist.fr/agency/trends-tokyo&#39;, &#39;https://www.sortlist.fr/agency/cortlex&#39;, &#39;https://www.sortlist.fr/agency/steinpilz-gmbh&#39;, &#39;https://www.sortlist.fr/agency/everrank-salesdesk24-gmbh&#39;, &#39;https://www.sortlist.fr/agency/cc-systems&#39;, &#39;https://www.sortlist.fr/agency/snapp-x&#39;, &#39;https://www.sortlist.fr/agency/scrumble&#39;, &#39;https://www.sortlist.fr/agency/gaofeng-creative&#39;, &#39;https://www.sortlist.fr/agency/dashdot&#39;, &#39;https://www.sortlist.fr/agency/therocketsoft&#39;, &#39;https://www.sortlist.fr/agency/run-gun-films&#39;, &#39;https://www.sortlist.fr/agency/studio-unbound&#39;, &#39;https://www.sortlist.fr/agency/contentgo&#39;, &#39;https://www.sortlist.fr/agency/tabua-digital-unipessoal-ldaa&#39;, &#39;https://www.sortlist.fr/agency/yarandin-llc&#39;, &#39;https://www.sortlist.fr/agency/fortis-media&#39;, &#39;https://www.sortlist.fr/agency/osenorth&#39;, &#39;https://www.sortlist.fr/agency/apps-square&#39;]
    

huangapple
  • 本文由 发表于 2023年6月26日 15:42:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554541.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定