使用Beautiful Soup查找标签的属性值

huangapple go评论76阅读模式
英文:

Find the attribute value of a tag using Beautiful soup

问题

You can extract the slug value from the data-gs-ta-val attribute using Beautiful Soup in Python like this:

from bs4 import BeautifulSoup

html = '''
    Your HTML content here
'''

soup = BeautifulSoup(html, 'html.parser')
elements = soup.find_all('li', class_='gs_ta_choice')

slug_values = []
for element in elements:
    data_gs_ta_val = element['data-gs-ta-val']
    data_gs_ta_val = eval(data_gs_ta_val.replace(''', '"'))  # Convert to a dictionary
    slug = data_gs_ta_val.get('slug', '')
    slug_values.append(slug)

print(slug_values)

This code will extract the slug values from the data-gs-ta-val attribute of each li element and store them in the slug_values list.

英文:
<li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Bangalore','value':'Bangalore','CID':'105','id':'105','P':'1','slug':'bangaluru'}" style="line-height: initial;"> Bangalore</li>
<li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Chennai','value':'Chennai','CID':'106','id':'106','P':'2','slug':'madras'}" style="line-height: initial;"> Chennai</li>
<li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Mumbai','value':'Mumbai','CID':'108','id':'108','P':'3','slug':'bombay'}" style="line-height: initial;"> Mumbai</li>

I want the slug value from data-gs-ta-value from each and every element using beautiful soup python.

答案1

得分: 2

这是您提供的代码的翻译部分:

你没有说明您如何获取HTML片段所以我假设您已将其作为字符串

data-gs-ta-val很有趣因为它看起来是Python字典的字符串表示

因此

from bs4 import BeautifulSoup as BS
from ast import literal_eval

html = """
<!DOCTYPE html>
  <html>
    <body>
      <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Bangalore','value':'Bangalore','CID':'105','id':'105','P':'1','slug':'bangaluru'}" style="line-height: initial;"> Bangalore</li>
      <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Chennai','value':'Chennai','CID':'106','id':'106','P':'2','slug':'madras'}" style="line-height: initial;"> Chennai</li>
      <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Mumbai','value':'Mumbai','CID':'108','id':'108','P':'3','slug':'bombay'}" style="line-height: initial;"> Mumbai</li>
    </body>
  </html>
"""

soup = BS(html, 'lxml')

for li in soup.find_all('li', class_='gs_ta_choice'):
    d = literal_eval(li['data-gs-ta-val'])
    print(d.get('slug', 'No slug here'))

输出:

bangaluru
madras
bombay
英文:

You don't say how you're getting the HTML fragment so I'll assume you have it as a string.

data-gs-ta-val is interesting because it looks like the associated datum is a string representation of a Python dictionary.

Therefore:

from bs4 import BeautifulSoup as BS
from ast import literal_eval

html = &quot;&quot;&quot;
&lt;!DOCTYPE html&gt;
  &lt;html&gt;
    &lt;body&gt;
      &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Bangalore&#39;,&#39;value&#39;:&#39;Bangalore&#39;,&#39;CID&#39;:&#39;105&#39;,&#39;id&#39;:&#39;105&#39;,&#39;P&#39;:&#39;1&#39;,&#39;slug&#39;:&#39;bangaluru&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Bangalore&lt;/li&gt;
      &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Chennai&#39;,&#39;value&#39;:&#39;Chennai&#39;,&#39;CID&#39;:&#39;106&#39;,&#39;id&#39;:&#39;106&#39;,&#39;P&#39;:&#39;2&#39;,&#39;slug&#39;:&#39;madras&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Chennai&lt;/li&gt;
      &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Mumbai&#39;,&#39;value&#39;:&#39;Mumbai&#39;,&#39;CID&#39;:&#39;108&#39;,&#39;id&#39;:&#39;108&#39;,&#39;P&#39;:&#39;3&#39;,&#39;slug&#39;:&#39;bombay&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Mumbai&lt;/li&gt;
    &lt;/body&gt;
  &lt;/html&gt;
&quot;&quot;&quot;

soup = BS(html, &#39;lxml&#39;)

for li in soup.find_all(&#39;li&#39;, class_=&#39;gs_ta_choice&#39;):
    d = literal_eval(li[&#39;data-gs-ta-val&#39;])
    print(d.get(&#39;slug&#39;, &#39;No slug here&#39;))

Output:

bangaluru
madras
bombay

答案2

得分: 1

以下是您要翻译的代码部分:

from bs4 import BeautifulSoup
import json

html_doc = """
<li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Bangalore','value':'Bangalore','CID':'105','id':'105','P':'1','slug':'bangaluru'}" style="line-height: initial;"> Bangalore</li>
<li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Chennai','value':'Chennai','CID':'106','id':'106','P':'2','slug':'madras'}" style="line-height: initial;"> Chennai</li>
<li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Mumbai','value':'Mumbai','CID':'108','id':'108','P':'3','slug':'bombay'}" style="line-height: initial;"> Mumbai</li>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

for li in soup.find_all('li'):
    data = li.attrs['data-gs-ta-val'].replace("'", '"')
    data = json.loads(data)
    #print(data)
    print(data['slug'])

希望这对您有所帮助。

英文:
from bs4 import BeautifulSoup
import json

html_doc = &quot;&quot;&quot;
&lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Bangalore&#39;,&#39;value&#39;:&#39;Bangalore&#39;,&#39;CID&#39;:&#39;105&#39;,&#39;id&#39;:&#39;105&#39;,&#39;P&#39;:&#39;1&#39;,&#39;slug&#39;:&#39;bangaluru&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Bangalore&lt;/li&gt;
&lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Chennai&#39;,&#39;value&#39;:&#39;Chennai&#39;,&#39;CID&#39;:&#39;106&#39;,&#39;id&#39;:&#39;106&#39;,&#39;P&#39;:&#39;2&#39;,&#39;slug&#39;:&#39;madras&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Chennai&lt;/li&gt;
&lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Mumbai&#39;,&#39;value&#39;:&#39;Mumbai&#39;,&#39;CID&#39;:&#39;108&#39;,&#39;id&#39;:&#39;108&#39;,&#39;P&#39;:&#39;3&#39;,&#39;slug&#39;:&#39;bombay&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Mumbai&lt;/li&gt;
&quot;&quot;&quot;

soup = BeautifulSoup(html_doc, &#39;html.parser&#39;)

for li in soup.find_all(&#39;li&#39;):
    data = li.attrs[&#39;data-gs-ta-val&#39;].replace(&quot;&#39;&quot;, &#39;&quot;&#39;)
    data = json.loads(data)
    #print(data)
    print(data[&#39;slug&#39;])

gives what you want

huangapple
  • 本文由 发表于 2023年7月4日 23:48:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76614213.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定