使用Puppeteer复制::before伪元素后的文本如何?

huangapple go评论79阅读模式
英文:

How to use Puppeteer to copy text after the ::before psuedo-element?

问题

我是新手网页抓取和 Puppeteer,我正在尝试学习库中的选择器方法。我试图选择::before标记后面的代码中的文本。我还想要直接在它后面的 span 中的文本,但是锚标记中的文本更重要。

另一个问题是页面上有其他十个共享类"trackName""playIcon"的 div,但我只想要具有这个类的第一个实例/div 中的文本。

我的思路是选择这个类,将所有实例映射到一个数组中,然后获取数组的第一个索引。

我尝试了以下两种方法,但都没有成功。

尝试1:

const content = await page.$('.trackName.playIcon');
const contentText = await page.evaluate(a => a.textContent, content);

console.log(contentText);

尝试2:

const content1 = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.trackName.playIcon'), a => a.textContent);
});
console.log(content1[0]);

我不确定我的错误是因为文本位于伪元素之后,还是因为我的语法有问题(这两个代码示例都返回 undefined)。

英文:

I am new to webscraping and Puppeteer and I'm attempting to learn the selector methods in the library. I am trying to select the text in the following code after the ::before tag. I also want the text in the span directly after it but the text in the anchor tag is more important.

<div class="details-inner">
<a class="trackName playIcon" href="/link/488824/sample-text-%40-link/" title="A Title is Here"> ::before "This is the text I want"</a>
<span class="track"> <a href="/anchor-link-to-somewhere/">I also want this text</a></span>
</div>

Another problem is that there are ten other divs on the page that share the class "trackName" and "playIcon" but I only want the text from the first instance/div that has this class.

My thought process was to select the class and map all of the instances into an Array and grab the first index of the array.

I tried the two following approaches to no avail.

Attempt 1:

const content = await page.$$('.trackName playIcon');
const contentText = await page.evaluate(a => a.textContent, content);

console.log(contentText);

Attempt 2:

const content1 = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.trackName playIcon'), a => a.textContent);
});
console.log(content1[0]);

I'm not sure if my errors stem from the text being after the psuedo element or if my syntax is just wrong. (Both these code samples return undefined).

答案1

得分: 0

以下是翻译好的部分:

// 稍微模拟一下,按照这个技巧执行以下代码:

const puppeteer = require("puppeteer"); // ^19.6.3

const html = `<!DOCTYPE html><html><head>
<style>
.trackName:before {
  content: "这是我想要的文本"
}
</style>
</head><body>
<div class="details-inner">
  <a class="trackName playIcon" href="/link/488824/sample-text-%40-link/" title="这里有标题">test</a>
  <span class="track"> <a href="/anchor-link-to-somewhere/">我也想要这个文本</a></span>
</div>
</body>
</html>`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const before = await page.$eval(
    ".trackName",
    el => getComputedStyle(el, ":before").content
  );
  const text = await page.$eval(".track", el => el.textContent.trim());
  console.log(before); // => "这是我想要的文本"
  console.log(text); // => 我也想要这个文本
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

如果使用 $ 方法而不是 $$ 方法,选择器会自动获取第一个元素。

你之前的尝试失败是因为 .trackName playIcon(查找 class="trackName" 内的 &lt;playIcon /&gt;)应该是 .trackName.playIcon(查找具有 class="trackName playIcon" 的元素)。

第一个尝试传递了一个元素句柄数组,但尝试使用 .textContent。使用 $[0] 来获取一个单一的元素,然后可以使用 .textContent


<details>
<summary>英文:</summary>

Mocking things out a bit and following [this technique](https://stackoverflow.com/a/62781104/6243352) yields the following code:

```js
const puppeteer = require(&quot;puppeteer&quot;); // ^19.6.3

const html = `&lt;!DOCTYPE html&gt;&lt;html&gt;&lt;head&gt;
&lt;style&gt;
.trackName:before {
  content: &quot;This is the text I want&quot;
}
&lt;/style&gt;
&lt;/head&gt;&lt;body&gt;
&lt;div class=&quot;details-inner&quot;&gt;
  &lt;a class=&quot;trackName playIcon&quot; href=&quot;/link/488824/sample-text-%40-link/&quot; title=&quot;A Title is Here&quot;&gt;test&lt;/a&gt;
  &lt;span class=&quot;track&quot;&gt; &lt;a href=&quot;/anchor-link-to-somewhere/&quot;&gt;I also want this text&lt;/a&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;`;

let browser;
(async () =&gt; {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);
  const before = await page.$eval(
    &quot;.trackName&quot;,
    el =&gt; getComputedStyle(el, &quot;:before&quot;).content
  );
  const text = await page.$eval(&quot;.track&quot;, el =&gt; el.textContent.trim());
  console.log(before); // =&gt; &quot;This is the text I want&quot;
  console.log(text); // =&gt; I also want this text
})()
  .catch(err =&gt; console.error(err))
  .finally(() =&gt; browser?.close());

Selectors automatically grab the first if you use $ methods rather than $$ methods.

Your attempts fail because .trackName playIcon (find a &lt;playIcon /&gt; inside of class=&quot;trackName&quot;) should be .trackName.playIcon (find an element class=&quot;trackName playIcon&quot;).

The first attempt passes an array of element handles but tries to .textContent it. Use a $ or [0] to get a singular element which you can .textContent.

huangapple
  • 本文由 发表于 2023年3月4日 02:03:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75630458.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定