英文:
Selenium not finding custom attribute 'data-target-section-id'
问题
我正在尝试爬取LinkedIn上特定职位的人的个人资料。为了做到这一点,我尝试找到"人"按钮并点击它,以专门查看相关的人员。
路径如下:
从未登录的LinkedIn首页 -> 我登录并转到LinkedIn首页 -> 我在搜索栏中输入"hr"并按回车键。
在"hr"的搜索结果页面上,在页面的左侧,有一个导航列表,显示"在本页"。其中一个选项包括"人",这就是我想要定位的内容。
页面链接为:https://www.linkedin.com/search/results/all/?keywords=hr&origin=GLOBAL_SEARCH_HEADER&sid=Xj2
"人"按钮的HTML代码如下:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People
我尝试通过By.Link_text
来查找此按钮,并找到关键词"People",但未成功。我还尝试过使用By.XPATH
来查找,但也没有找到它。
如何使Selenium查找此自定义属性,以便通过data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ=="
找到此按钮?
我还遇到的另一个问题是,我可以定位页面上所有相关的人,并循环遍历它们,但无法提取每个个人资料的链接。它只获取第一个人的链接,然后永远不再更新变量。
例如,如果第一个人是Ian,第二个人是Brian,它会给我Ian的个人资料链接,即使users
是Brian。
在调试循环时,我可以看到all_users
中的正确人员列表,但它只获取列表中第一个人的href,然后不再更新。
以下是相关代码:
all_users = driver.find_elements(By.XPATH, "//*[contains(@class, 'entity-result__title-line entity-result__title-line--2-lines')]")
for users in all_users:
print(users)
get_links = users.find_element(By.XPATH, "//*[contains(@href, 'miniProfileUrn')]")
print(get_links.get_attribute('href'))
英文:
I'm trying to scrape some profiles of people in linkedin from a specific job. To do this I was trying to find the people button and click it to specifically look at the relevant people.
The path is as follows:
From signed out Linkedin home -> I sign in and go to LinkedIn home -> I write in the search bar "hr" and hit enter.
In the result page of hr, on the left side of the page, there is a navigation list that says "On this page". One of the options includes "People" and that is what I want to target.
The link to the page is: https://www.linkedin.com/search/results/all/?keywords=hr&origin=GLOBAL_SEARCH_HEADER&sid=Xj2
The HTML of the button for 'People' in the navigation list is:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People
I have tried to find this button through By.Link_text
and found the keyword People
but did not work. I have also tried to do By.XPATH "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")""
but it also does not find it.
How can I make selenium find this custom attribute so I can find this button through data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ=="?
Another issue that I am having is that I can target all the relevant people on the page and loop through them but I cannot extract the link of each of the profiles. It only takes the first link of the first person and never updates the variable again through the loop.
For example, if the first person is Ian, and the second is Brian, it gives me the link for Ian's profile even if 'users' is Brian.
Debugging the loop I can see the correct list of people in all_users but it only gets the href of the first person in the list and never updates.
Here is the code of that:
all_users = driver.find_elements(By.XPATH, "//*[contains(@class, 'entity-result__title-line entity-result__title-line--2-lines')]")
for users in all_users:
print(users)
get_links = users.find_element(By.XPATH, "//*[contains(@href, 'miniProfileUrn')]")
print(get_links.get_attribute('href'))
答案1
得分: 1
如果您想定位具有相同属性的多个元素,请将 find_element
替换为 find_elements
。看看是否可以找到匹配您搜索的不仅是第一个元素,而且是具有该属性的所有元素。
请查看 Selenium:定位元素文档,看看是否可以尝试它们用于定位元素的每个选项。
还有另一种尝试方法:
button_element = driver.find_element(By.XPATH, "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")
list_element.find_element(By.TAG_NAME, "button").click()
英文:
If you want to locate several elements with the same attribute replace find_element
with find_elements
. See if that works to find not just the first element matching your search, but all elements with that attribute.
Review the Selenium: Locating Elements documentation and see if you can try each and every option they have for locating elements.
Something else to try:
button_element = driver.find_element(By.XPATH, "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")
list_element.find_element(By.TAG_NAME, "button").click()
答案2
得分: 1
The data-target-section-id that you mention is not the same as the one that the button has (PTFmMNSPSz2LQRzwynhRBQ==). Check that this is not dynamic before targeting it.
Your xPath is not bad but as I told you, fix the target-id:
driver.findElement(By.xpath("//button[@data-target-section-id='PTFmMNSPSz2LQRzwynhRBQ==']")).click()
Where "driver" is your WebDriver instance.
英文:
> I have also tried to do By.XPATH
> "//button[@data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but
> it also does not find it.
The data-target-section-id that you mention is not the same as the one that the button has (PTFmMNSPSz2LQRzwynhRBQ==). Check that this is not dynamic before targeting it.
Your xPath is not bad but as I told you, fix the target-id:
driver.findElement(By.xpath("//button[@data-target-section-id='PTFmMNSPSz2LQRzwynhRBQ==']")).click()
Where "driver" is your WebDriver instance.
答案3
得分: 0
给定的HTML:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People </button>
</li>
data-target-section-id 属性的值,例如 PTFmMNSPSz2LQRzwynhRBQ==
,是动态生成的,可能会在访问应用程序时更改,甚至在下次启动应用程序时更改。因此不能用于定位元素。
解决方案
所需的元素是一个动态元素,要点击该 可点击 元素,您需要使用 WebDriverWait 来等待元素可点击(element_to_be_clickable()),并且您可以使用以下任一 定位策略:
- 使用 CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.search-navigation-panel_button[data-target-section-id]"))).click()
- 使用 XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='search-navigation-panel_button' and @data-target-section-id][contains(., 'People')]"))).click()
- 注意:您必须添加以下导入语句:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
英文:
Given the HTML:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People </button>
</li>
The data-target-section-id attribute values like PTFmMNSPSz2LQRzwynhRBQ==
are dynamically generated and is bound to chage sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.
Solution
The desired element being a dynamic element to click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
-
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.search-navigation-panel_button[data-target-section-id]"))).click()
-
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='search-navigation-panel_button' and @data-target-section-id][contains(., 'People')]"))).click()
-
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
答案4
得分: 0
代码部分已翻译,以下是翻译好的内容:
看起来你的“People”按钮定位器不起作用的原因是data-target-section-id
是动态的。我的显示为hopW8RkwTN2R9dPgL6Fm/w==
。我们可以通过使用XPath来定位包含文本“People”的元素来解决这个问题,例如:
//button[text()='People']
事实证明,这将匹配页面上的两个元素,因为左侧导航链接的许多按钮在页面顶部也以圆形按钮的形式重复出现,所以我们可以进一步细化我们的定位器:
//button[text()='People'][@data-target-section-id]
话虽如此,这个链接只滚动页面,所以你实际上不需要点击它。
从那里,你想要获取“People”标题下列出的每个人的链接。首先,我们需要包含“People”部分的DIV。这有点混乱,因为这些元素上的ID也是动态的,所以我们需要找到包含“People”的H2,然后在DOM中向上查找包含该部分的DIV。我们可以使用以下XPath来获取它:
//div[@class='search-results-container']/div[.//h2[text()='People']]
从那里,我们想要获取链接到每个人的所有A标签... 在该部分有很多A标签,但大多数不是我们想要的,所以我们需要进行更多的筛选。我发现以下XPath可以定位该部分中的每个唯一URL:
//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]
将这两个XPath组合起来,我们得到:
//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]
这将定位页面上“People”部分中每个人的唯一URL。
使用这个,你的代码将如下所示:
all_users = driver.find_elements(By.XPATH, "//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]")
for user in all_users:
print(user.get_attribute('href'))
注意:你的代码之所以一直返回第一个href是因为你是从具有XPath的现有元素进行搜索,所以你需要在XPath的开头添加一个“.”来指示从引用的元素开始搜索。
get_links = users.find_element(By.XPATH, ".//*[contains(@href, 'miniProfileUrn')]")
^ 在这里添加句点
在我的代码中,我已经消除了这一步,所以你不需要在那里添加它。
英文:
It looks like the reason your People button locator isn't working is because the data-target-section-id
is dynamic. Mine is showing as hopW8RkwTN2R9dPgL6Fm/w==
. We can get around that by using an XPath to locate the element based on the text contained, "People", e.g.
//button[text()='People']
Turns out that matches two elements on the page because many of the left nav links are repeated as rounded buttons on the top of the page so we can further refine our locator to
//button[text()='People'][@data-target-section-id]
Having said that, that link only scrolls the page so you don't really need to click that.
From there, you want to get the links to each person listed under the People heading. We first need the DIV that contains the People section. It's kinda messy because the IDs on those elements are also dynamic so we need to find the H2 that contains "People" and then work our way back up the DOM to the DIV that contains only that section. We can get that using the XPath below
//div[@class='search-results-container']/div[.//h2[text()='People']]
From there, we want all of the A tags that uniquely link to a person... and there's a lot of A tags in that section but most are not ones we want so we need to do more filtering. I found that the below XPath locates each unique URL in that section.
//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]
Combining the two XPaths, we get
//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]
which locates all unique URLs belonging to a person in the People section of the page.
Using this, your code would look like
all_users = driver.find_elements(By.XPATH, "//div[@class='search-results-container']/div[.//h2[text()='People']]//a[contains(@href,'miniProfileUrn')][contains(@class,'scale-down')]")
for user in all_users:
print(user.get_attribute('href'))
NOTE: The reason your code was only returning the first href repeatedly is because you are searching from an existing element with an XPath so you need to add a "." at the start of the XPath to indicate to start searching from the referenced element.
get_links = users.find_element(By.XPATH, ".//*[contains(@href, 'miniProfileUrn')]")
^ add period here
I've eliminated that step in my code so you won't need it there.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论