英文:
Get email from website-link which only can be seen when its manually clicked?
问题
我想从这个网站获取电子邮件地址:
https://irglobal.com/advisor/angus-forsyth
我尝试了以下代码:
import time
import os
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
if __name__ == '__main__':
WAIT = 1
print("检查浏览器驱动程序...")
os.environ['WDM_LOG'] = '0'
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
srv = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=srv, options=options)
waitWD = WebDriverWait(driver, 10)
link = "https://irglobal.com/advisor/angus-forsyth"
print(f"正在处理链接:{link}")
driver.get(link)
time.sleep(WAIT)
soup = BeautifulSoup(driver.page_source, 'lxml')
tmp = soup.find("a", {"class": "btn email"})
print(tmp.prettify())
driver.quit()
但是我在这个HTML标签中看不到任何电子邮件地址:
<a class="btn email" data-id="103548" href="#">
<svg aria-hidden="true" class="svg-inline--fa fa-envelope" data-fa-i2svg="" data-icon="envelope" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg">
<path d="M48 64C21.5 64 0 85.5 0 112c0 15.1 7.1 29.3 19.2 38.4L236.8 313.6c11.4 8.5 27 8.5 38.4 0L492.8 150.4c12.1-9.1 19.2-23.3 19.2-38.4c0-26.5-21.5-48-48-48H48zM0 176V384c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V176L294.4 339.2c-22.8 17.1-54 17.1-76.8 0L0 176z" fill="currentColor">
</path>
</svg>
<!-- <i class="fas fa-envelope"></i> -->
</a>
当我手动在网站上点击按钮时,我可以在打开的电子邮件程序中看到电子邮件地址。
如何获取此电子邮件地址?
现在,这应该只适用于特定链接:
https://irglobal.com/advisor/angus-forsyth
这也应该适用于该网站上的任何人 - 因此我需要该邮件图标背后的信息:
https://irglobal.com/advisor/ns-shastri/
https://irglobal.com/advisor/adriana-posada/
等等。
英文:
i would like to get the email-address from this site:
https://irglobal.com/advisor/angus-forsyth
I tried it with the following code:
import time
import os
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
if __name__ == '__main__':
WAIT = 1
print(f"Checking Browser driver...")
os.environ['WDM_LOG'] = '0'
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
srv=Service(ChromeDriverManager().install())
driver = webdriver.Chrome (service=srv, options=options)
waitWD = WebDriverWait (driver, 10)
link = "https://irglobal.com/advisor/angus-forsyth"
print(f"Working for {link}")
driver.get (link)
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'lxml')
tmp = soup.find("a", {"class": "btn email"})
print(tmp.prettify())
driver.quit()
But i can´t see any email in this html-tag:
(selenium) C:\DEV\Fiverr\TRY\saschanielsen>python tmp2.py
Checking Browser driver...
Working for https://irglobal.com/advisor/angus-forsyth
<a class="btn email" data-id="103548" href="#">
<svg aria-hidden="true" class="svg-inline--fa fa-envelope" data-fa-i2svg="" data-icon="envelope" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg">
<path d="M48 64C21.5 64 0 85.5 0 112c0 15.1 7.1 29.3 19.2 38.4L236.8 313.6c11.4 8.5 27 8.5 38.4 0L492.8 150.4c12.1-9.1 19.2-23.3 19.2-38.4c0-26.5-21.5-48-48-48H48zM0 176V384c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V176L294.4 339.2c-22.8 17.1-54 17.1-76.8 0L0 176z" fill="currentColor">
</path>
</svg>
<!-- <i class="fas fa-envelope"></i> -->
</a>
When i click on the button manually on the site:
i can see the email-address in the opened email-program:
How can i get this email-address?
This should now only work for the specific link:
https://irglobal.com/advisor/angus-forsyth
This should also work for any person on this site - so i need the information which is behind this mail-icon:
https://irglobal.com/advisor/ns-shastri/
https://irglobal.com/advisor/adriana-posada/
etc.
答案1
得分: 1
作为替代方案,您可以点击并打开相邻标签中的相应URL,然后使用以下定位策略打印引发WebDriverWait的电子邮件地址 visibility_of_all_elements_located(),而不是在已打开的电子邮件程序中查找电子邮件地址:
-
代码块:
driver.get("https://irglobal.com/advisor/angus-forsyth/") parent_window = driver.current_window_handle driver.execute_script("scroll(0, 250);") element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h1//following::a[1]"))).click() all_windows = driver.window_handles new_window = [window for window in all_windows if window != parent_window][0] driver.switch_to.window(new_window) print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'Email')]//a"))).text) driver.close() driver.switch_to.window(parent_window)
-
控制台输出:
angus@angfor.hk
英文:
As an alternative to the email-address in the opened email-program, you can also click and open the respective url in the adjascent tab and print the email-address inducing WebDriverWait for visibility_of_all_elements_located() using the following locator strategy:
-
Code Block:
driver.get("https://irglobal.com/advisor/angus-forsyth/") parent_window = driver.current_window_handle driver.execute_script("scroll(0, 250);") element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h1//following::a[1]"))).click() all_windows = driver.window_handles new_window = [window for window in all_windows if window != parent_window][0] driver.switch_to.window(new_window) print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'Email')]//a"))).text) driver.close() driver.switch_to.window(parent_window)
-
Console Output:
angus@angfor.hk
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论