英文:
How to use Puppeteer to copy text after the ::before psuedo-element?
问题
我是新手网页抓取和 Puppeteer,我正在尝试学习库中的选择器方法。我试图选择::before
标记后面的代码中的文本。我还想要直接在它后面的 span 中的文本,但是锚标记中的文本更重要。
另一个问题是页面上有其他十个共享类"trackName"
和"playIcon"
的 div,但我只想要具有这个类的第一个实例/div 中的文本。
我的思路是选择这个类,将所有实例映射到一个数组中,然后获取数组的第一个索引。
我尝试了以下两种方法,但都没有成功。
尝试1:
const content = await page.$('.trackName.playIcon');
const contentText = await page.evaluate(a => a.textContent, content);
console.log(contentText);
尝试2:
const content1 = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.trackName.playIcon'), a => a.textContent);
});
console.log(content1[0]);
我不确定我的错误是因为文本位于伪元素之后,还是因为我的语法有问题(这两个代码示例都返回 undefined)。
英文:
I am new to webscraping and Puppeteer and I'm attempting to learn the selector methods in the library. I am trying to select the text in the following code after the ::before
tag. I also want the text in the span directly after it but the text in the anchor tag is more important.
<div class="details-inner">
<a class="trackName playIcon" href="/link/488824/sample-text-%40-link/" title="A Title is Here"> ::before "This is the text I want"</a>
<span class="track"> <a href="/anchor-link-to-somewhere/">I also want this text</a></span>
</div>
Another problem is that there are ten other divs on the page that share the class "trackName"
and "playIcon"
but I only want the text from the first instance/div that has this class.
My thought process was to select the class and map all of the instances into an Array and grab the first index of the array.
I tried the two following approaches to no avail.
Attempt 1:
const content = await page.$$('.trackName playIcon');
const contentText = await page.evaluate(a => a.textContent, content);
console.log(contentText);
Attempt 2:
const content1 = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.trackName playIcon'), a => a.textContent);
});
console.log(content1[0]);
I'm not sure if my errors stem from the text being after the psuedo element or if my syntax is just wrong. (Both these code samples return undefined).
答案1
得分: 0
以下是翻译好的部分:
// 稍微模拟一下,按照这个技巧执行以下代码:
const puppeteer = require("puppeteer"); // ^19.6.3
const html = `<!DOCTYPE html><html><head>
<style>
.trackName:before {
content: "这是我想要的文本"
}
</style>
</head><body>
<div class="details-inner">
<a class="trackName playIcon" href="/link/488824/sample-text-%40-link/" title="这里有标题">test</a>
<span class="track"> <a href="/anchor-link-to-somewhere/">我也想要这个文本</a></span>
</div>
</body>
</html>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const before = await page.$eval(
".trackName",
el => getComputedStyle(el, ":before").content
);
const text = await page.$eval(".track", el => el.textContent.trim());
console.log(before); // => "这是我想要的文本"
console.log(text); // => 我也想要这个文本
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
如果使用 $
方法而不是 $$
方法,选择器会自动获取第一个元素。
你之前的尝试失败是因为 .trackName playIcon
(查找 class="trackName"
内的 <playIcon />
)应该是 .trackName.playIcon
(查找具有 class="trackName playIcon"
的元素)。
第一个尝试传递了一个元素句柄数组,但尝试使用 .textContent
。使用 $
或 [0]
来获取一个单一的元素,然后可以使用 .textContent
。
<details>
<summary>英文:</summary>
Mocking things out a bit and following [this technique](https://stackoverflow.com/a/62781104/6243352) yields the following code:
```js
const puppeteer = require("puppeteer"); // ^19.6.3
const html = `<!DOCTYPE html><html><head>
<style>
.trackName:before {
content: "This is the text I want"
}
</style>
</head><body>
<div class="details-inner">
<a class="trackName playIcon" href="/link/488824/sample-text-%40-link/" title="A Title is Here">test</a>
<span class="track"> <a href="/anchor-link-to-somewhere/">I also want this text</a></span>
</div>
</body>
</html>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const before = await page.$eval(
".trackName",
el => getComputedStyle(el, ":before").content
);
const text = await page.$eval(".track", el => el.textContent.trim());
console.log(before); // => "This is the text I want"
console.log(text); // => I also want this text
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Selectors automatically grab the first if you use $
methods rather than $$
methods.
Your attempts fail because .trackName playIcon
(find a <playIcon />
inside of class="trackName"
) should be .trackName.playIcon
(find an element class="trackName playIcon"
).
The first attempt passes an array of element handles but tries to .textContent
it. Use a $
or [0]
to get a singular element which you can .textContent
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论