2023年3月20日 23:54:03go评论56阅读模式

英文:

How to select element with nested elements which fulfill a condition in Puppeteer?

问题

我有一个HTML中的标签A元素列表，每个都包含标签B的嵌套元素。如何使用Puppeteer选择满足特定条件的嵌套元素B的元素A？

HTML示例：

<A>
    <B>1</B>
</A>
<A>
    <B>2</B>
</A>

当我尝试获取包含innerText为"1"的B的A元素时，我已经尝试了以下代码：

const element = await page.evaluate(() => {
    return [...document.querySelectorAll("A > B[innerText='1']")];
});
console.log(element); // undefined

请注意，更希望不使用jQuery的答案。

英文:

I have a list of elements of tag A in my HTML, that each have nested elements of tag B. How do I select the element A, whose nested element B fulfills a specific condition using puppeteer?

HTML example:

&lt;A&gt;
    &lt;B&gt;1&lt;/B&gt;
&lt;/A&gt;
&lt;A&gt;
    &lt;B&gt;2&lt;/B&gt;
&lt;/A&gt;

When I try to get the A element containing the B with innerText "1", I've tried

const element = await page.evaluate(() =&gt; {
    return [...document.querySelectorAll(&quot;A &gt; B[innerText=&#39;1&#39;]&quot;)];
});
console.log(element); // undefined

Answers without jQuery are preferred.

答案1

得分: 1

以下是您要求的翻译：

"It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.

In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.

With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the "text/" prefix, XPaths and CSS selectors with DOM traversal.

I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:

const puppeteer = require("puppeteer"); // ^19.7.5

const html = `
<A>
    <B>1</B>
</A>
<A>
    <B>2</B>
</A>
`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const el = await page.evaluateHandle(() =>
    [...document.querySelectorAll("A")].find(el =>
      [...el.querySelectorAll("B")].find(
        e => e.textContent.trim() === "1"
      )
    )
  );
  console.log(await el.evaluate(el => el.outerHTML)); // just to verify
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

If you don't mind XPath syntax, you could instead use the shorter:

const el = await page.$('xpath///A[B[normalize-space() = "1"]]');

If <B> can be deeply nested within <A>:

const el = await page.$('xpath///A[//B[normalize-space() = "1"]]');

If you don't need an ElementHandle, you can just use $$eval or evaluate directly and return the text or other serializable data you need.

See this answer for more options for XPath text extraction in Puppeteer."

英文:

It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.

In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.

I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:

const puppeteer = require(&quot;puppeteer&quot;); // ^19.7.5

const html = `
&lt;A&gt;
    &lt;B&gt;1&lt;/B&gt;
&lt;/A&gt;
&lt;A&gt;
    &lt;B&gt;2&lt;/B&gt;
&lt;/A&gt;
`;

let browser;
(async () =&gt; {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const el = await page.evaluateHandle(() =&gt;
    [...document.querySelectorAll(&quot;A&quot;)].find(el =&gt;
      [...el.querySelectorAll(&quot;B&quot;)].find(
        e =&gt; e.textContent.trim() === &quot;1&quot;
      )
    )
  );
  console.log(await el.evaluate(el =&gt; el.outerHTML)); // just to verify
})()
  .catch(err =&gt; console.error(err))
  .finally(() =&gt; browser?.close());

If you don't mind XPath syntax, you could instead use the shorter:

const el = await page.$(&#39;xpath///A[B[normalize-space() = &quot;1&quot;]]&#39;);

If <B> can be deeply nested within <A>:

const el = await page.$(&#39;xpath///A[//B[normalize-space() = &quot;1&quot;]]&#39;);

If you don't need an ElementHandle, you can just use $$eval or evaluate directly and return the text or other serializable data you need.

See this answer for more options for XPath text extraction in Puppeteer.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Puppeteer选择具有满足条件的嵌套元素的元素？

问题

答案1

如何在依赖项中使用 Puppeteer？

错误：评估失败：TypeError：无法读取未定义的属性（读取’CallCollection’）

如何查找任何级别的链接子元素

从图像标签中使用 Puppeteer 获取 src

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论