2023年2月18日 10:25:30go评论94阅读模式

英文:

scrape a sub attribute? with bs4 in python

问题

I'm trying to scrape the id's on a website, but I can't figure out how to specify the entry I want to work with. this is the most I could narrow it down to a specific class, but I'm not sure how to target the number by 'id' under subclass 'data-preview.' here's what I've narrow the variable soup down to:

<li class="Li FnPreviewItem" data-preview='{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png", }'>
<div class="Li Inner FnImage">
<span class="Image" style="background-image:url(www.website.com/image.png);"></span>
</div>
<div class="ImgPreview FnPreviewImage MdNonDisp">
<span class="Image FnPreview" style="background-image:url(www.website.com/image.png);">
</span></div>
</li>

here is the relevant snippet of what I have so far:

from pathlib import Path
from bs4 import BeautifulSoup
import requests
import re

url = "www.website.com/image.png"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

elsoupo = soup.find(attrs={"class": "a fancy title for this class"})
print(elsoupo)

just started working with python, so hopefully I'm wording this so it makes some sense.

Tried to narrow it down with a second attribute that could have any number but I just None back.

elsoupoNum = elsoupo.find(attrs={"id":r'^[-+]?[0-9]+$'})

print(elsoupoNum)

Please note that the HTML and Python code you provided seem to contain some issues. If you need further assistance, please provide more context or a clearer description of the problem you're facing.

英文:

&lt;li class=&quot;Li FnPreviewItem&quot; data-preview=&#39;{ &quot;type&quot; : &quot;animation&quot;, &quot;id&quot; : &quot;288857982&quot;, &quot;staticUrl&quot; : &quot;www.website.com/image.png&quot;,  }&#39;&gt;
&lt;div class=&quot;Li Inner FnImage&quot;&gt;
&lt;span class=&quot;Image&quot; style=&quot;background-image:url(www.website.com/image.png);&quot;&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;div class=&quot;ImgPreview FnPreviewImage MdNonDisp&quot;&gt;
&lt;span class=&quot;Image FnPreview&quot; style=&quot;background-image:url(www.website.com/image.png);&quot;&gt;
&lt;/span&gt;&lt;/div&gt;
&lt;/li&gt;

here is the relevant snippet of what I have so far:

from pathlib import Path
from bs4 import BeautifulSoup
import requests
import re

url = &quot;www.website.com/image.png&quot;
r = requests.get(url)
soup = BeautifulSoup(r.content, &#39;html.parser&#39;)

elsoupo = soup.find(attrs={&quot;class&quot;: &quot;a fancy title for this class&quot;})
print(elsoupo)

just started working with python, so hopefully I'm wording this so it makes some sense.

Tried to narrow it down with a second attribute that could have any number but I just None back.

elsoupoNum = elsoupo.find(attrs={&quot;id&quot;:&quot;^[-+]?[0-9]+$&quot;})

print(elsoupoNum)

答案1

得分: 0

data-preview 是 li 元素的一个属性，其值是一个（格式不正确的）JSON字符串。我已经为了简化而进行了修正，你可以查看这个链接。

code

from bs4 import BeautifulSoup
import json

str = '''<li class="Li FnPreviewItem" data-preview='{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png"  }'>
<div class="Li Inner FnImage">
<span class="Image" style="background-image:url(www.website.com/image.png);"></span>
</div>
<div class="ImgPreview FnPreviewImage MdNonDisp">
<span class="Image FnPreview" style="background-image:url(www.website.com/image.png);"></span></div>
</li>'''

soup = BeautifulSoup(str, 'html.parser')
li = soup.select_one('li[data-preview]')
data = li.attrs['data-preview']
print(data)
j=json.loads(data)
print(j['id'])

output

{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png"  }
288857982


<details>
<summary>英文:</summary>

`data-preview` is an attribute for `li` element with a (ill-formed) json string as its value. I corrected it for simplicity, you may want to check [this][1].

**code**

from bs4 import BeautifulSoup
import json

str = '''
<li class="Li FnPreviewItem" data-preview='{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png" }'>
<div class="Li Inner FnImage">
<span class="Image" style="background-image:url(www.website.com/image.png);"></span>
</div>
<div class="ImgPreview FnPreviewImage MdNonDisp">
<span class="Image FnPreview" style="background-image:url(www.website.com/image.png);">
</span></div>
</li>
'''

soup = BeautifulSoup(str, 'html.parser')
li = soup.select_one('li[data-preview]')
data = li.attrs['data-preview']
print(data)
j=json.loads(data)
print(j['id'])


**output**

{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png" }
288857982


  [1]: https://stackoverflow.com/questions/23705304/can-json-loads-ignore-trailing-commas

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用Python中的BeautifulSoup4来提取子属性？

问题

答案1

递归函数用于读取 YAML 文件，查找包含其他 YAML 文件的键。

使用Golang解析HTML，提取所有带有``和``标签的内容。

如何执行“全部”和“一半”，而不是输入给定金额（经济）

如何在GNU Radio中计算两个信号源之间的相位差

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论