数据在使用XPath进行网页抓取时的重复性

huangapple go评论76阅读模式
英文:

data repetition in webscraping using xpath

问题

I understand your issue. To extract the specific information you want from the web page with a single occurrence, you can modify your XPath expression. Instead of using .map, you can use [1] to get the first occurrence of the element. Here's the modified code:

$x('//div[@class="cs"]/div/text()[1]').map(x => x.wholeText)

This code will give you an array with only one occurrence of each item:

['CS 35 (1.4)', 'CS 269 (7.3)', 'CS 137 (8.5)', 'CS 241 (7.5)', 'CS 226 (9.2)', ...]

By specifying [1], you are getting the first occurrence of the text within the specified div, which should solve your problem of repeated data.

英文:

I have a problem when im trying scraping data, when im looking an specific information
in google chrome browser console, this repeat it seven times and it goes to the next, here is my code

$x('//div[@class="cs"]/div/text()').map(x=>x.wholeText)

this code gives me this

['CS 35 (1.4)', 'CS 35 (1.4)', 'CS 35 (1.4)', 'CS 35 (1.4)', 
'CS 35 (1.4)', 'CS 35 (1.4)', 'CS 35 (1.4)', 'CS 269 (7.3)', 
'CS 269 (7.3)', 'CS 269 (7.3)', 'CS 269 (7.3)', 'CS 269 (7.3)', 
'CS 269 (7.3)', 'CS 269 (7.3)', 'CS 137 (8.5)', 'CS 137 (8.5)'
 ....................
'CS 241 (7.5)', 'CS 241 (7.5)', 'CS 241 (7.5)', 'CS 226 (9.2)', 
'CS 226 (9.2)', …]

Just i want this one time
CS 35 (1.4) and then this CS 269 (7.3) and so.. i dont want it so many times

this web page im scraping https://www.op.gg/summoners/kr/Hide%20on%20bush

I want a code that helps me solve the problem that I put above

答案1

得分: 1

Output should be:

[ "CS 35 (1.4)", "CS 269 (7.3)", "CS 137 (8.5)", "CS 226 (6.8)",
"CS 224 (7.7)", "CS 262 (8.7)", "CS 218 (8.8)", "CS 160 (5.6)",
"CS 252 (9.9)", "CS 239 (7)", … ]

英文:

Try changing your xpath expression to

$x('//div[@class="stats"]//div[@class="cs"]').map(x=>x.innerText)

Output should be:

[ "CS 35 (1.4)", "CS 269 (7.3)", "CS 137 (8.5)", "CS 226 (6.8)", 
"CS 224 (7.7)", "CS 262 (8.7)", "CS 218 (8.8)", "CS 160 (5.6)", 
"CS 252 (9.9)", "CS 239 (7)", … ]

huangapple
  • 本文由 发表于 2023年3月9日 20:37:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684729.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定