如何使用Puppeteer选择具有满足条件的嵌套元素的元素?

huangapple go评论56阅读模式
英文:

How to select element with nested elements which fulfill a condition in Puppeteer?

问题

我有一个HTML中的标签A元素列表,每个都包含标签B的嵌套元素。如何使用Puppeteer选择满足特定条件的嵌套元素B的元素A?

HTML示例:

<A>
    <B>1</B>
</A>
<A>
    <B>2</B>
</A>

当我尝试获取包含innerText为"1"的B的A元素时,我已经尝试了以下代码:

const element = await page.evaluate(() => {
    return [...document.querySelectorAll("A > B[innerText='1']")];
});
console.log(element); // undefined

请注意,更希望不使用jQuery的答案。

英文:

I have a list of elements of tag A in my HTML, that each have nested elements of tag B. How do I select the element A, whose nested element B fulfills a specific condition using puppeteer?

HTML example:

&lt;A&gt;
    &lt;B&gt;1&lt;/B&gt;
&lt;/A&gt;
&lt;A&gt;
    &lt;B&gt;2&lt;/B&gt;
&lt;/A&gt;

When I try to get the A element containing the B with innerText "1", I've tried

const element = await page.evaluate(() =&gt; {
    return [...document.querySelectorAll(&quot;A &gt; B[innerText=&#39;1&#39;]&quot;)];
});
console.log(element); // undefined

Answers without jQuery are preferred.

答案1

得分: 1

以下是您要求的翻译:

"It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.

In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.

With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the "text/" prefix, XPaths and CSS selectors with DOM traversal.

I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:

const puppeteer = require("puppeteer"); // ^19.7.5

const html = `
<A>
    <B>1</B>
</A>
<A>
    <B>2</B>
</A>
`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const el = await page.evaluateHandle(() =>
    [...document.querySelectorAll("A")].find(el =>
      [...el.querySelectorAll("B")].find(
        e => e.textContent.trim() === "1"
      )
    )
  );
  console.log(await el.evaluate(el => el.outerHTML)); // just to verify
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

If you don't mind XPath syntax, you could instead use the shorter:

const el = await page.$('xpath///A[B[normalize-space() = "1"]]');

If <B> can be deeply nested within <A>:

const el = await page.$('xpath///A[//B[normalize-space() = "1"]]');

If you don't need an ElementHandle, you can just use $$eval or evaluate directly and return the text or other serializable data you need.

See this answer for more options for XPath text extraction in Puppeteer."

英文:

It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.

In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.

With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the &quot;text/&quot; prefix, XPaths and CSS selectors with DOM traversal.

I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:

const puppeteer = require(&quot;puppeteer&quot;); // ^19.7.5

const html = `
&lt;A&gt;
    &lt;B&gt;1&lt;/B&gt;
&lt;/A&gt;
&lt;A&gt;
    &lt;B&gt;2&lt;/B&gt;
&lt;/A&gt;
`;

let browser;
(async () =&gt; {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const el = await page.evaluateHandle(() =&gt;
    [...document.querySelectorAll(&quot;A&quot;)].find(el =&gt;
      [...el.querySelectorAll(&quot;B&quot;)].find(
        e =&gt; e.textContent.trim() === &quot;1&quot;
      )
    )
  );
  console.log(await el.evaluate(el =&gt; el.outerHTML)); // just to verify
})()
  .catch(err =&gt; console.error(err))
  .finally(() =&gt; browser?.close());

If you don't mind XPath syntax, you could instead use the shorter:

const el = await page.$(&#39;xpath///A[B[normalize-space() = &quot;1&quot;]]&#39;);

If &lt;B&gt; can be deeply nested within &lt;A&gt;:

const el = await page.$(&#39;xpath///A[//B[normalize-space() = &quot;1&quot;]]&#39;);

If you don't need an ElementHandle, you can just use $$eval or evaluate directly and return the text or other serializable data you need.

See this answer for more options for XPath text extraction in Puppeteer.

huangapple
  • 本文由 发表于 2023年3月20日 23:54:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792520.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定