英文:
Cheerio does not return items from the given path
问题
我主要的目标是,通过cheerio,对这个IMDB排名页面的标题进行抓取。
然而,根据cheerio的文档,并在所列标题的确切HTML路径上进行设置,我仍然返回随机和混乱的对象,如下所示:
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'80': <ref *81> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *81],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *81],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
...
代码如下:
import * as cheerio from 'cheerio';
import axios from 'axios';
import fs from 'fs';
axios("https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv").then(res => {
const data = res.data;
const $ = cheerio.load(data);
var cheerioData = $('.lister-list>tr').each((i, e) => {
const title = $(e).find('.titleColumn a').text();
console.log(title);
})
console.log(cheerioData);
})
我真的不明白出了什么问题,因为路径是完全正确的。是否有人可以帮助我?
英文:
My main objective is, through cheerio, to make a scrapping of the titles of this imdb ranking
https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv
However, following cheerio's documentation and placing the exact html path of the listed titles, I am still returned random and confusing objects, like:
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'80': <ref *81> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *81],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *81],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'81': <ref *82> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *82],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *82],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
'82': <ref *83> Element {
parent: Element {
parent: [Element],
prev: [Text],
next: [Text],
startIndex: null,
endIndex: null,
children: [Array],
name: 'tbody',
attribs: [Object: null prototype],
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype],
'x-attribsPrefix': [Object: null prototype]
},
prev: Text {
parent: [Element],
prev: [Element],
next: [Circular *83],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
next: Text {
parent: [Element],
prev: [Circular *83],
next: [Element],
startIndex: null,
endIndex: null,
data: '\n\n ',
type: 'text'
},
startIndex: null,
endIndex: null,
children: [
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text], [Element],
[Text]
],
name: 'tr',
attribs: [Object: null prototype] {},
type: 'tag',
namespace: 'http://www.w3.org/1999/xhtml',
'x-attribsNamespace': [Object: null prototype] {},
'x-attribsPrefix': [Object: null prototype] {}
},
code:
import * as cheerio from 'cheerio';
import axios from 'axios';
import fs from 'fs';
axios("https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv").then(res => {
const data = res.data;
const $ = cheerio.load(data);
var cheerioData = $('.lister-list>tr').each((i, e) => {
const title = $(e).find('.titleColumn a').text();
console.log(title);
})
console.log(cheerioData);
})
I really don't understand what is being done wrong as the path is completely correct. can anybody help me?
答案1
得分: 1
你可以使用map
,后跟一个展开操作、.get()
或.toArray()
将Cheerio对象数组转换为文本数组。
例如,使用展开和原生JS的Array#map
:
import axios from "axios";
import cheerio from "cheerio";
const url = "<Your URL>";
axios(url).then(res => {
const $ = cheerio.load(res.data);
const text = [...$(".lister-list > tr")].map(e =>
$(e).find(".titleColumn a").text().trim()
);
console.log(text);
})
另一种可能的方法是,在Cheerio的.map
之后使用.get()
或.toArray()
(其中索引是第一个参数):
const text = $(".lister-list > tr").map((i, e) =>
$(e).find(".titleColumn a").text().trim()
).get();
如果你想使用.each
,你可以将每个文本字符串推送到一个普通数组中,但这不像.map
那样干净,因为.map
存在的目的是将这种模式抽象出来:
const text = [];
$(".lister-list > tr").each((i, e) => {
text.push($(e).find(".titleColumn a").text().trim());
});
英文:
You can convert the array of Cheerio objects to an array of text using map
followed by a spread, a .get()
or a .toArray()
.
For example, with spread and vanilla JS Array#map
:
import axios from "axios";
import cheerio from "cheerio";
const url = "<Your URL>";
axios(url).then(res => {
const $ = cheerio.load(res.data);
const text = [...$(".lister-list > tr")].map(e =>
$(e).find(".titleColumn a").text().trim()
);
console.log(text);
})
Also possible, using .get()
or .toArray()
after a Cheerio .map
(which has the index as the first argument):
const text = $(".lister-list > tr").map((i, e) =>
$(e).find(".titleColumn a").text().trim()
).get();
If you want to use .each
, you can .push()
each text string onto a vanilla array, but this isn't as clean as .map
, which exists to abstract away this pattern:
const text = [];
$(".lister-list > tr").each((i, e) => {
text.push($(e).find(".titleColumn a").text().trim());
});
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论