英文:
How to select element with nested elements which fulfill a condition in Puppeteer?
问题
我有一个HTML中的标签A元素列表,每个都包含标签B的嵌套元素。如何使用Puppeteer选择满足特定条件的嵌套元素B的元素A?
HTML示例:
<A>
<B>1</B>
</A>
<A>
<B>2</B>
</A>
当我尝试获取包含innerText为"1"的B的A元素时,我已经尝试了以下代码:
const element = await page.evaluate(() => {
return [...document.querySelectorAll("A > B[innerText='1']")];
});
console.log(element); // undefined
请注意,更希望不使用jQuery的答案。
英文:
I have a list of elements of tag A in my HTML, that each have nested elements of tag B. How do I select the element A, whose nested element B fulfills a specific condition using puppeteer?
HTML example:
<A>
<B>1</B>
</A>
<A>
<B>2</B>
</A>
When I try to get the A element containing the B with innerText "1", I've tried
const element = await page.evaluate(() => {
return [...document.querySelectorAll("A > B[innerText='1']")];
});
console.log(element); // undefined
Answers without jQuery are preferred.
答案1
得分: 1
以下是您要求的翻译:
"It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.
In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.
With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the "text/"
prefix, XPaths and CSS selectors with DOM traversal.
I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:
const puppeteer = require("puppeteer"); // ^19.7.5
const html = `
<A>
<B>1</B>
</A>
<A>
<B>2</B>
</A>
`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const el = await page.evaluateHandle(() =>
[...document.querySelectorAll("A")].find(el =>
[...el.querySelectorAll("B")].find(
e => e.textContent.trim() === "1"
)
)
);
console.log(await el.evaluate(el => el.outerHTML)); // just to verify
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
If you don't mind XPath syntax, you could instead use the shorter:
const el = await page.$('xpath///A[B[normalize-space() = "1"]]');
If <B>
can be deeply nested within <A>
:
const el = await page.$('xpath///A[//B[normalize-space() = "1"]]');
If you don't need an ElementHandle, you can just use $$eval
or evaluate
directly and return the text or other serializable data you need.
See this answer for more options for XPath text extraction in Puppeteer."
英文:
It's good that you've provided a mockup of the HTML, but import details can be lost in translation. The best way to get an accurate answer is to share the actual site. If that isn't possible, showing an example of the actual HTML can often be enough.
In many cases, adding more context reveals a much better way to get the result you want. Some of those details you may deem unimportant may actually turn out to be critical.
With that in mind, Puppeteer's options for text selection at the time of writing are fairly limited relative to Playwright (arguably a good thing; fewer abstractions to remember): the "text/"
prefix, XPaths and CSS selectors with DOM traversal.
I usually start with a CSS selector and DOM traversal approach since it's easiest to remember for me:
const puppeteer = require("puppeteer"); // ^19.7.5
const html = `
<A>
<B>1</B>
</A>
<A>
<B>2</B>
</A>
`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const el = await page.evaluateHandle(() =>
[...document.querySelectorAll("A")].find(el =>
[...el.querySelectorAll("B")].find(
e => e.textContent.trim() === "1"
)
)
);
console.log(await el.evaluate(el => el.outerHTML)); // just to verify
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
If you don't mind XPath syntax, you could instead use the shorter:
const el = await page.$('xpath///A[B[normalize-space() = "1"]]');
If <B>
can be deeply nested within <A>
:
const el = await page.$('xpath///A[//B[normalize-space() = "1"]]');
If you don't need an ElementHandle, you can just use $$eval
or evaluate
directly and return the text or other serializable data you need.
See this answer for more options for XPath text extraction in Puppeteer.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论