Cheerio不返回给定路径的项目。

huangapple go评论66阅读模式
英文:

Cheerio does not return items from the given path

问题

我主要的目标是,通过cheerio,对这个IMDB排名页面的标题进行抓取。

然而,根据cheerio的文档,并在所列标题的确切HTML路径上进行设置,我仍然返回随机和混乱的对象,如下所示:

  'x-attribsNamespace': [Object: null prototype] {},
  'x-attribsPrefix': [Object: null prototype] {}
},
'80': <ref *81> Element {
  parent: Element {
    parent: [Element],
    prev: [Text],
    next: [Text],
    startIndex: null,
    endIndex: null,
    children: [Array],
    name: 'tbody',
    attribs: [Object: null prototype],
    type: 'tag',
    namespace: 'http://www.w3.org/1999/xhtml',
    'x-attribsNamespace': [Object: null prototype],
    'x-attribsPrefix': [Object: null prototype]
  },
  prev: Text {
    parent: [Element],
    prev: [Element],
    next: [Circular *81],
    startIndex: null,
    endIndex: null,
    data: '\n\n  ',
    type: 'text'
  },
  next: Text {
    parent: [Element],
    prev: [Circular *81],
    next: [Element],
    startIndex: null,
    endIndex: null,
    data: '\n\n  ',
    type: 'text'
  },
  startIndex: null,
  endIndex: null,
  children: [
    [Text], [Element],
    [Text], [Element],
    [Text], [Element],
    [Text], [Element],
    [Text], [Element],
    [Text]
  ],
  name: 'tr',
  attribs: [Object: null prototype] {},
  type: 'tag',
  namespace: 'http://www.w3.org/1999/xhtml',
  'x-attribsNamespace': [Object: null prototype] {},
  'x-attribsPrefix': [Object: null prototype] {}
},
...

代码如下:

import * as cheerio from 'cheerio';
import axios from 'axios';
import fs from 'fs';

axios("https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv").then(res => {
    const data = res.data;
    const $ = cheerio.load(data);

    var cheerioData = $('.lister-list>tr').each((i, e) => {
        const title = $(e).find('.titleColumn a').text();
        console.log(title);
    })
    console.log(cheerioData);
})

我真的不明白出了什么问题,因为路径是完全正确的。是否有人可以帮助我?

英文:

My main objective is, through cheerio, to make a scrapping of the titles of this imdb ranking

https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv

However, following cheerio's documentation and placing the exact html path of the listed titles, I am still returned random and confusing objects, like:

  'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'80': <ref *81> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *81],
startIndex: null,
endIndex: null,
data: '\n\n  ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *81],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n  ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'81': <ref *82> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *82],
startIndex: null,
endIndex: null,
data: '\n\n  ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *82],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n  ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'82': <ref *83> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *83],
startIndex: null,
endIndex: null,
data: '\n\n  ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *83],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n  ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},

code:

import * as cheerio from 'cheerio';
import axios from 'axios';
import fs from 'fs';
axios("https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv").then(res => {
const data = res.data;
const $ = cheerio.load(data);
var cheerioData = $('.lister-list>tr').each((i, e) => {
const title = $(e).find('.titleColumn a').text();
console.log(title);
})
console.log(cheerioData);
})

I really don't understand what is being done wrong as the path is completely correct. can anybody help me?

答案1

得分: 1

你可以使用map,后跟一个展开操作、.get().toArray()将Cheerio对象数组转换为文本数组。

例如,使用展开和原生JS的Array#map

import axios from "axios";
import cheerio from "cheerio";

const url = "<Your URL>";

axios(url).then(res => {
  const $ = cheerio.load(res.data);
  const text = [...$(".lister-list > tr")].map(e =>
    $(e).find(".titleColumn a").text().trim()
  );
  console.log(text);
})

另一种可能的方法是,在Cheerio的.map之后使用.get().toArray()(其中索引是第一个参数):

const text = $(".lister-list > tr").map((i, e) =>
  $(e).find(".titleColumn a").text().trim()
).get();

如果你想使用.each,你可以将每个文本字符串推送到一个普通数组中,但这不像.map那样干净,因为.map存在的目的是将这种模式抽象出来:

const text = [];
$(".lister-list > tr").each((i, e) => {
  text.push($(e).find(".titleColumn a").text().trim());
});
英文:

You can convert the array of Cheerio objects to an array of text using map followed by a spread, a .get() or a .toArray().

For example, with spread and vanilla JS Array#map:

import axios from &quot;axios&quot;;
import cheerio from &quot;cheerio&quot;;

const url = &quot;&lt;Your URL&gt;&quot;;

axios(url).then(res =&gt; {
  const $ = cheerio.load(res.data);
  const text = [...$(&quot;.lister-list &gt; tr&quot;)].map(e =&gt;
    $(e).find(&quot;.titleColumn a&quot;).text().trim()
  );
  console.log(text);
})

Also possible, using .get() or .toArray() after a Cheerio .map (which has the index as the first argument):

const text = $(&quot;.lister-list &gt; tr&quot;).map((i, e) =&gt;
  $(e).find(&quot;.titleColumn a&quot;).text().trim()
).get();

If you want to use .each, you can .push() each text string onto a vanilla array, but this isn't as clean as .map, which exists to abstract away this pattern:

const text = [];
$(&quot;.lister-list &gt; tr&quot;).each((i, e) =&gt; {
  text.push($(e).find(&quot;.titleColumn a&quot;).text().trim());
});

huangapple
  • 本文由 发表于 2023年2月7日 04:15:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75366136.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定