英文:
Get specific text from 2 paragraphs in the same class using Scrapy
问题
我对Scrapy非常陌生,我想要能够使用Scrapy shell提取两个文本段落:“Fintec, Cybersecurity”和“Serie C”。
如果我运行
response.css('div.card-body p.card-text strong::text').get()
我会得到'Secteur',但我正在寻找'Fintec, Cybersecurity'。
对于
response.css('div.card-body p.card-text::text').get()
我得到'/n'。
我注意到如果我使用
response.css('div.card-body p.card-text:nth-child(3)').get()
我会得到
<p class="card-text">
<strong>Round</strong> : Série C
</p>
而对于
response.css('div.card-body p.card-text:nth-child(2)').get()
我得到
<p class="card-text">
<strong>Secteur</strong> : Fintech, Cybersecurity
</p>
我如何获得Serie C和Fintech, Cybersecurity?
谢谢
英文:
I'm very new to Scrapy and I want to be able to extract both texts paragraph using Scrapy shell: "Fintec, Cybersecurity" and "Serie C"
If I run
response.css('div.card-body p.card-text strong::text').get()
I get 'Secteur' but I'm looking for 'Fintec, Cybersecurity'.
for
response.css('div.card-body p.card-text::text').get()
I get '/n'
I've noticed if I use
response.css('div.card-body p.card-text:nth-child(3)').get()
I get < p class="card-text">\n<strong>Round</strong> : Série C\n < /p>
and for
response.css('div.card-body p.card-text:nth-child(2)').get()
I get
< p class="card-text">\n<strong>Secteur</strong> : Fintech, Cybersecurity\n < / p>
How do I get Serie C and Fintech Cybersecurity?
Thank you
答案1
得分: 0
这应该可以工作... 'div.card-body p.card-text::text'
你只需要使用 getall
或 extract
方法。
这是我在ipython中做的一个示例:
In [3]: html = ''''<div class="card-body">
...: <h3 class="card-title mb-1">L</h3>
...: <p class="card-text">
...: <strong>Secteur</strong>
...: " : Fintech, Cybersecurity "
...: </p>
...: <p class="card-text">
...: <strong>Round</strong>
...: " : Serie C "
...: </p>
...: <p class="card-text">
...: <small class="text-muted"> 2820 votes enregistres </small>
...: </p>
...: </div>'''
In [4]: response = parsel.Selector(html)
In [5]: for p in response.css('<div class="card-body" p.card-text::text').getall():
...: text = '''.join(p).strip()
...: print(text)
...:
" : Fintech, Cybersecurity "
" : Serie C "
英文:
This should work... 'div.card-body p.card-text::text'
you just need to use either the getall
or extract
methods.
Here is an example I did in ipython:
In [3]: html = '''<div class="card-body">
...: <h3 class="card-title mb-1">L</h3>
...: <p class="card-text">
...: <strong>Secteur</strong>
...: " : Fintech, Cybersecurity "
...: </p>
...: <p class="card-text">
...: <strong>Round</strong>
...: " : Serie C "
...: </p>
...: <p class="card-text">
...: <small class="text-muted"> 2820 votes enregistres </small>
...: </p>
...: </div>'''
In [4]: response = parsel.Selector(html)
In [5]: for p in response.css('div.card-body p.card-text::text').getall():
...: text=''.join(p).strip()
...: print(text)
...:
" : Fintech, Cybersecurity "
" : Serie C "
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论