从只能在手动点击时才能看到的网站链接获取电子邮件?

huangapple go评论109阅读模式
英文:

Get email from website-link which only can be seen when its manually clicked?

问题

我想从这个网站获取电子邮件地址:
https://irglobal.com/advisor/angus-forsyth

我尝试了以下代码:

  1. import time
  2. import os
  3. from bs4 import BeautifulSoup
  4. from selenium import webdriver
  5. from selenium.webdriver.chrome.options import Options
  6. from selenium.webdriver.chrome.service import Service
  7. from selenium.webdriver.support.ui import WebDriverWait
  8. from webdriver_manager.chrome import ChromeDriverManager
  9. if __name__ == '__main__':
  10. WAIT = 1
  11. print("检查浏览器驱动程序...")
  12. os.environ['WDM_LOG'] = '0'
  13. options = Options()
  14. options.add_argument("start-maximized")
  15. options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
  16. options.add_experimental_option("excludeSwitches", ["enable-automation"])
  17. options.add_experimental_option('excludeSwitches', ['enable-logging'])
  18. options.add_experimental_option('useAutomationExtension', False)
  19. options.add_argument('--disable-blink-features=AutomationControlled')
  20. srv = Service(ChromeDriverManager().install())
  21. driver = webdriver.Chrome(service=srv, options=options)
  22. waitWD = WebDriverWait(driver, 10)
  23. link = "https://irglobal.com/advisor/angus-forsyth"
  24. print(f"正在处理链接:{link}")
  25. driver.get(link)
  26. time.sleep(WAIT)
  27. soup = BeautifulSoup(driver.page_source, 'lxml')
  28. tmp = soup.find("a", {"class": "btn email"})
  29. print(tmp.prettify())
  30. driver.quit()

但是我在这个HTML标签中看不到任何电子邮件地址:

  1. <a class="btn email" data-id="103548" href="#">
  2. <svg aria-hidden="true" class="svg-inline--fa fa-envelope" data-fa-i2svg="" data-icon="envelope" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg">
  3. <path d="M48 64C21.5 64 0 85.5 0 112c0 15.1 7.1 29.3 19.2 38.4L236.8 313.6c11.4 8.5 27 8.5 38.4 0L492.8 150.4c12.1-9.1 19.2-23.3 19.2-38.4c0-26.5-21.5-48-48-48H48zM0 176V384c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V176L294.4 339.2c-22.8 17.1-54 17.1-76.8 0L0 176z" fill="currentColor">
  4. </path>
  5. </svg>
  6. <!-- <i class="fas fa-envelope"></i> -->
  7. </a>

当我手动在网站上点击按钮时,我可以在打开的电子邮件程序中看到电子邮件地址。

如何获取此电子邮件地址?

现在,这应该只适用于特定链接:
https://irglobal.com/advisor/angus-forsyth

这也应该适用于该网站上的任何人 - 因此我需要该邮件图标背后的信息:
https://irglobal.com/advisor/ns-shastri/
https://irglobal.com/advisor/adriana-posada/
等等。

英文:

i would like to get the email-address from this site:
https://irglobal.com/advisor/angus-forsyth

I tried it with the following code:

  1. import time
  2. import os
  3. from bs4 import BeautifulSoup
  4. from selenium import webdriver
  5. from selenium.webdriver.chrome.options import Options
  6. from selenium.webdriver.chrome.service import Service
  7. from selenium.webdriver.support.ui import WebDriverWait
  8. from webdriver_manager.chrome import ChromeDriverManager
  9. if __name__ == &#39;__main__&#39;:
  10. WAIT = 1
  11. print(f&quot;Checking Browser driver...&quot;)
  12. os.environ[&#39;WDM_LOG&#39;] = &#39;0&#39;
  13. options = Options()
  14. options.add_argument(&quot;start-maximized&quot;)
  15. options.add_experimental_option(&quot;prefs&quot;, {&quot;profile.default_content_setting_values.notifications&quot;: 1})
  16. options.add_experimental_option(&quot;excludeSwitches&quot;, [&quot;enable-automation&quot;])
  17. options.add_experimental_option(&#39;excludeSwitches&#39;, [&#39;enable-logging&#39;])
  18. options.add_experimental_option(&#39;useAutomationExtension&#39;, False)
  19. options.add_argument(&#39;--disable-blink-features=AutomationControlled&#39;)
  20. srv=Service(ChromeDriverManager().install())
  21. driver = webdriver.Chrome (service=srv, options=options)
  22. waitWD = WebDriverWait (driver, 10)
  23. link = &quot;https://irglobal.com/advisor/angus-forsyth&quot;
  24. print(f&quot;Working for {link}&quot;)
  25. driver.get (link)
  26. time.sleep(WAIT)
  27. soup = BeautifulSoup (driver.page_source, &#39;lxml&#39;)
  28. tmp = soup.find(&quot;a&quot;, {&quot;class&quot;: &quot;btn email&quot;})
  29. print(tmp.prettify())
  30. driver.quit()

But i can´t see any email in this html-tag:

  1. (selenium) C:\DEV\Fiverr\TRY\saschanielsen&gt;python tmp2.py
  2. Checking Browser driver...
  3. Working for https://irglobal.com/advisor/angus-forsyth
  4. &lt;a class=&quot;btn email&quot; data-id=&quot;103548&quot; href=&quot;#&quot;&gt;
  5. &lt;svg aria-hidden=&quot;true&quot; class=&quot;svg-inline--fa fa-envelope&quot; data-fa-i2svg=&quot;&quot; data-icon=&quot;envelope&quot; data-prefix=&quot;fas&quot; focusable=&quot;false&quot; role=&quot;img&quot; viewbox=&quot;0 0 512 512&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;
  6. &lt;path d=&quot;M48 64C21.5 64 0 85.5 0 112c0 15.1 7.1 29.3 19.2 38.4L236.8 313.6c11.4 8.5 27 8.5 38.4 0L492.8 150.4c12.1-9.1 19.2-23.3 19.2-38.4c0-26.5-21.5-48-48-48H48zM0 176V384c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V176L294.4 339.2c-22.8 17.1-54 17.1-76.8 0L0 176z&quot; fill=&quot;currentColor&quot;&gt;
  7. &lt;/path&gt;
  8. &lt;/svg&gt;
  9. &lt;!-- &lt;i class=&quot;fas fa-envelope&quot;&gt;&lt;/i&gt; --&gt;
  10. &lt;/a&gt;

When i click on the button manually on the site:

从只能在手动点击时才能看到的网站链接获取电子邮件?

i can see the email-address in the opened email-program:

从只能在手动点击时才能看到的网站链接获取电子邮件?

How can i get this email-address?

This should now only work for the specific link:
https://irglobal.com/advisor/angus-forsyth

This should also work for any person on this site - so i need the information which is behind this mail-icon:
https://irglobal.com/advisor/ns-shastri/
https://irglobal.com/advisor/adriana-posada/
etc.

答案1

得分: 1

作为替代方案,您可以点击并打开相邻标签中的相应URL,然后使用以下定位策略打印引发WebDriverWait的电子邮件地址 visibility_of_all_elements_located(),而不是在已打开的电子邮件程序中查找电子邮件地址:

  • 代码块:

    1. driver.get("https://irglobal.com/advisor/angus-forsyth/")
    2. parent_window = driver.current_window_handle
    3. driver.execute_script("scroll(0, 250);")
    4. element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h1//following::a[1]"))).click()
    5. all_windows = driver.window_handles
    6. new_window = [window for window in all_windows if window != parent_window][0]
    7. driver.switch_to.window(new_window)
    8. print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'Email')]//a"))).text)
    9. driver.close()
    10. driver.switch_to.window(parent_window)
  • 控制台输出:

    1. angus@angfor.hk
英文:

As an alternative to the email-address in the opened email-program, you can also click and open the respective url in the adjascent tab and print the email-address inducing WebDriverWait for visibility_of_all_elements_located() using the following locator strategy:

  • Code Block:

    1. driver.get(&quot;https://irglobal.com/advisor/angus-forsyth/&quot;)
    2. parent_window = driver.current_window_handle
    3. driver.execute_script(&quot;scroll(0, 250);&quot;)
    4. element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, &quot;//h1//following::a[1]&quot;))).click()
    5. all_windows = driver.window_handles
    6. new_window = [window for window in all_windows if window != parent_window][0]
    7. driver.switch_to.window(new_window)
    8. print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, &quot;//span[contains(., &#39;Email&#39;)]//a&quot;))).text)
    9. driver.close()
    10. driver.switch_to.window(parent_window)
  • Console Output:

    1. angus@angfor.hk

huangapple
  • 本文由 发表于 2023年6月26日 23:36:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76558192.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定