Scrapy选择器没有“看到”网页上存在的元素

huangapple go评论61阅读模式
英文:

Scrapy selector doesn't "see" an element that is present on the webpage

问题

我想解析以下网页:
https://mafiaworldtour.com/tournaments/2653

我需要找到以下元素:

```//html/body/div[1]/div/section[2]/div/div/div/div[1]/div[1]/div/div[2]/div/div[1]/div[2]/span/text()```

当我在网页上通过检查查找它时,它明显存在,但
```city = response.xpath('//html/body/div[1]/div/section[2]/div/div/div/div[1]/div[1]/div/div[2]/div/div[1]/div[2]/span/text()').extract_first()``` 返回 None。

这是为什么呢?

我期望通过xpath获得比赛的城市 `Хайфа, Израиль`。
英文:

I want to parse the following webpage:
https://mafiaworldtour.com/tournaments/2653

And I need to find the following element:

//html/body/div[1]/div/section[2]/div/div/div/div[1]/div[1]/div/div[2]/div/div[1]/div[2]/span/text()

When I search it on the webpage via inspect, it is clearly present, but
city = response.xpath('//html/body/div[1]/div/section[2]/div/div/div/div[1]/div[1]/div/div[2]/div/div[1]/div[2]/span/text()').extract_first() returns None.

What is the reason for this?

I expect to get the city Хайфа, Израиль of the tournament via xpath.

答案1

得分: 0

使用我的项目retrieveCssOrXpathSelectorFromTextOrNode来获取完整的[tag:xpath]查询:

x('Хайфа, Израиль');
//body/div[@class="site-wrapper"]/div[@class="main"][@role="main"]/section[@class="page-content"]/div[@class="container"]/div[@class="tabs&quot]/div[@class="tab-content&quot]/div[@class="tab-pane fade in active "][@id="general&quot]/div[@class="row&quot]/div[@class="col-md-12&quot]/div[@class="table-responsive&quot]/div[@class="responsive-info-table&quot]/div[@class="row with-top-border&quot]/div[@class="col-md-6&quot]/span[@class="small_content"]

总是比使用相对路径的chrome dev tools自动生成的XPath查询更好:

//html/body/div[1]/div/section[2]/div/div......

但是你可以删除无用的部分,应该是这样的:

(从chrome dev toolsfirefox控制台):

$x('//span[@class="small_content"]')[0].innerText

或者在你的情况下:

response.xpath('//span[@class="small_content"]/text()').extract_first()

输出:

" Хайфа, Израиль"
英文:

Using my own project retrieveCssOrXpathSelectorFromTextOrNode to fetch the full [tag:xpath] query:

x('Хайфа, Израиль');
//body/div[@class="site-wrapper"]/div[@class="main"][@role="main"]/section[@class="page-content"]/div[@class="container"]/div[@class="tabs"]/div[@class="tab-content"]/div[@class="tab-pane fade in active "][@id="general"]/div[@class="row"]/div[@class="col-md-12"]/div[@class="table-responsive"]/div[@class="responsive-info-table"]/div[@class="row with-top-border"]/div[@class="col-md-6"]/span[@class="small_content"]

It's always better to have these specific XPath query's than the one with relative path like auto-generated by chrome dev tools:

//html/body/div[1]/div/section[2]/div/div......

But you can remove the useless part, should be like:

(From chrome dev tools, or firefox console):

$x('//span[@class="small_content"]')[0].innerText 

or in your case:

response.xpath('//span[@class="small_content"]/text()').extract_first()

Output

" Хайфа, Израиль"     

答案2

得分: 0

CSS选择器

response.css('.small_content::text').get()

XPATH

response.xpath('//span[@class="small_content"]/text()').get()
英文:

you can use both CSS selector orXPATH

CSS selector

response.css('.small_content::text').get()

XPATH

response.xpath('//span[@class="small_content"]/text()').get()

huangapple
  • 本文由 发表于 2023年2月19日 18:49:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75499579.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定