从只能在手动点击时才能看到的网站链接获取电子邮件?

huangapple go评论72阅读模式
英文:

Get email from website-link which only can be seen when its manually clicked?

问题

我想从这个网站获取电子邮件地址:
https://irglobal.com/advisor/angus-forsyth

我尝试了以下代码:

import time
import os
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

if __name__ == '__main__':
    WAIT = 1
    print("检查浏览器驱动程序...")
    os.environ['WDM_LOG'] = '0'
    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    srv = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=srv, options=options)
    waitWD = WebDriverWait(driver, 10)

    link = "https://irglobal.com/advisor/angus-forsyth"
    print(f"正在处理链接:{link}")
    driver.get(link)
    time.sleep(WAIT)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    tmp = soup.find("a", {"class": "btn email"})
    print(tmp.prettify())
    driver.quit()

但是我在这个HTML标签中看不到任何电子邮件地址:

<a class="btn email" data-id="103548" href="#">
 <svg aria-hidden="true" class="svg-inline--fa fa-envelope" data-fa-i2svg="" data-icon="envelope" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg">
  <path d="M48 64C21.5 64 0 85.5 0 112c0 15.1 7.1 29.3 19.2 38.4L236.8 313.6c11.4 8.5 27 8.5 38.4 0L492.8 150.4c12.1-9.1 19.2-23.3 19.2-38.4c0-26.5-21.5-48-48-48H48zM0 176V384c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V176L294.4 339.2c-22.8 17.1-54 17.1-76.8 0L0 176z" fill="currentColor">
  </path>
 </svg>
 <!-- <i class="fas fa-envelope"></i> -->
</a>

当我手动在网站上点击按钮时,我可以在打开的电子邮件程序中看到电子邮件地址。

如何获取此电子邮件地址?

现在,这应该只适用于特定链接:
https://irglobal.com/advisor/angus-forsyth

这也应该适用于该网站上的任何人 - 因此我需要该邮件图标背后的信息:
https://irglobal.com/advisor/ns-shastri/
https://irglobal.com/advisor/adriana-posada/
等等。

英文:

i would like to get the email-address from this site:
https://irglobal.com/advisor/angus-forsyth

I tried it with the following code:

import time
import os
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

if __name__ == &#39;__main__&#39;: 
  WAIT = 1  
  print(f&quot;Checking Browser driver...&quot;)
  os.environ[&#39;WDM_LOG&#39;] = &#39;0&#39; 
  options = Options()
  options.add_argument(&quot;start-maximized&quot;)
  options.add_experimental_option(&quot;prefs&quot;, {&quot;profile.default_content_setting_values.notifications&quot;: 1})    
  options.add_experimental_option(&quot;excludeSwitches&quot;, [&quot;enable-automation&quot;])
  options.add_experimental_option(&#39;excludeSwitches&#39;, [&#39;enable-logging&#39;])
  options.add_experimental_option(&#39;useAutomationExtension&#39;, False)
  options.add_argument(&#39;--disable-blink-features=AutomationControlled&#39;) 
  srv=Service(ChromeDriverManager().install())
  driver = webdriver.Chrome (service=srv, options=options)    
  waitWD = WebDriverWait (driver, 10)         
  
  link = &quot;https://irglobal.com/advisor/angus-forsyth&quot;
  print(f&quot;Working for {link}&quot;)  
  driver.get (link)     
  time.sleep(WAIT) 
  soup = BeautifulSoup (driver.page_source, &#39;lxml&#39;)      
  tmp = soup.find(&quot;a&quot;, {&quot;class&quot;: &quot;btn email&quot;})   
  print(tmp.prettify())
  driver.quit()

But i can´t see any email in this html-tag:

(selenium) C:\DEV\Fiverr\TRY\saschanielsen&gt;python tmp2.py
Checking Browser driver...
Working for https://irglobal.com/advisor/angus-forsyth
&lt;a class=&quot;btn email&quot; data-id=&quot;103548&quot; href=&quot;#&quot;&gt;
 &lt;svg aria-hidden=&quot;true&quot; class=&quot;svg-inline--fa fa-envelope&quot; data-fa-i2svg=&quot;&quot; data-icon=&quot;envelope&quot; data-prefix=&quot;fas&quot; focusable=&quot;false&quot; role=&quot;img&quot; viewbox=&quot;0 0 512 512&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;
  &lt;path d=&quot;M48 64C21.5 64 0 85.5 0 112c0 15.1 7.1 29.3 19.2 38.4L236.8 313.6c11.4 8.5 27 8.5 38.4 0L492.8 150.4c12.1-9.1 19.2-23.3 19.2-38.4c0-26.5-21.5-48-48-48H48zM0 176V384c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V176L294.4 339.2c-22.8 17.1-54 17.1-76.8 0L0 176z&quot; fill=&quot;currentColor&quot;&gt;
  &lt;/path&gt;
 &lt;/svg&gt;
 &lt;!-- &lt;i class=&quot;fas fa-envelope&quot;&gt;&lt;/i&gt; --&gt;
&lt;/a&gt;

When i click on the button manually on the site:

从只能在手动点击时才能看到的网站链接获取电子邮件?

i can see the email-address in the opened email-program:

从只能在手动点击时才能看到的网站链接获取电子邮件?

How can i get this email-address?

This should now only work for the specific link:
https://irglobal.com/advisor/angus-forsyth

This should also work for any person on this site - so i need the information which is behind this mail-icon:
https://irglobal.com/advisor/ns-shastri/
https://irglobal.com/advisor/adriana-posada/
etc.

答案1

得分: 1

作为替代方案,您可以点击并打开相邻标签中的相应URL,然后使用以下定位策略打印引发WebDriverWait的电子邮件地址 visibility_of_all_elements_located(),而不是在已打开的电子邮件程序中查找电子邮件地址:

  • 代码块:

    driver.get("https://irglobal.com/advisor/angus-forsyth/")
    parent_window = driver.current_window_handle
    driver.execute_script("scroll(0, 250);")
    element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h1//following::a[1]"))).click()
    all_windows = driver.window_handles
    new_window = [window for window in all_windows if window != parent_window][0]
    driver.switch_to.window(new_window)
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'Email')]//a"))).text)
    driver.close()
    driver.switch_to.window(parent_window)
    
  • 控制台输出:

    angus@angfor.hk
    
英文:

As an alternative to the email-address in the opened email-program, you can also click and open the respective url in the adjascent tab and print the email-address inducing WebDriverWait for visibility_of_all_elements_located() using the following locator strategy:

  • Code Block:

    driver.get(&quot;https://irglobal.com/advisor/angus-forsyth/&quot;)
    parent_window = driver.current_window_handle
    driver.execute_script(&quot;scroll(0, 250);&quot;)
    element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, &quot;//h1//following::a[1]&quot;))).click()
    all_windows = driver.window_handles
    new_window = [window for window in all_windows if window != parent_window][0]
    driver.switch_to.window(new_window)
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, &quot;//span[contains(., &#39;Email&#39;)]//a&quot;))).text)
    driver.close()
    driver.switch_to.window(parent_window)
    
  • Console Output:

    angus@angfor.hk
    

huangapple
  • 本文由 发表于 2023年6月26日 23:36:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76558192.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定