使用Beautiful Soup查找标签的属性值

huangapple go评论98阅读模式
英文:

Find the attribute value of a tag using Beautiful soup

问题

You can extract the slug value from the data-gs-ta-val attribute using Beautiful Soup in Python like this:

  1. from bs4 import BeautifulSoup
  2. html = '''
  3. Your HTML content here
  4. '''
  5. soup = BeautifulSoup(html, 'html.parser')
  6. elements = soup.find_all('li', class_='gs_ta_choice')
  7. slug_values = []
  8. for element in elements:
  9. data_gs_ta_val = element['data-gs-ta-val']
  10. data_gs_ta_val = eval(data_gs_ta_val.replace(''', '"')) # Convert to a dictionary
  11. slug = data_gs_ta_val.get('slug', '')
  12. slug_values.append(slug)
  13. print(slug_values)

This code will extract the slug values from the data-gs-ta-val attribute of each li element and store them in the slug_values list.

英文:
  1. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Bangalore','value':'Bangalore','CID':'105','id':'105','P':'1','slug':'bangaluru'}" style="line-height: initial;"> Bangalore</li>
  2. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Chennai','value':'Chennai','CID':'106','id':'106','P':'2','slug':'madras'}" style="line-height: initial;"> Chennai</li>
  3. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Mumbai','value':'Mumbai','CID':'108','id':'108','P':'3','slug':'bombay'}" style="line-height: initial;"> Mumbai</li>

I want the slug value from data-gs-ta-value from each and every element using beautiful soup python.

答案1

得分: 2

这是您提供的代码的翻译部分:

  1. 你没有说明您如何获取HTML片段所以我假设您已将其作为字符串
  2. data-gs-ta-val很有趣因为它看起来是Python字典的字符串表示
  3. 因此
  4. from bs4 import BeautifulSoup as BS
  5. from ast import literal_eval
  6. html = """
  7. <!DOCTYPE html>
  8. <html>
  9. <body>
  10. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Bangalore','value':'Bangalore','CID':'105','id':'105','P':'1','slug':'bangaluru'}" style="line-height: initial;"> Bangalore</li>
  11. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Chennai','value':'Chennai','CID':'106','id':'106','P':'2','slug':'madras'}" style="line-height: initial;"> Chennai</li>
  12. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Mumbai','value':'Mumbai','CID':'108','id':'108','P':'3','slug':'bombay'}" style="line-height: initial;"> Mumbai</li>
  13. </body>
  14. </html>
  15. """
  16. soup = BS(html, 'lxml')
  17. for li in soup.find_all('li', class_='gs_ta_choice'):
  18. d = literal_eval(li['data-gs-ta-val'])
  19. print(d.get('slug', 'No slug here'))

输出:

  1. bangaluru
  2. madras
  3. bombay
英文:

You don't say how you're getting the HTML fragment so I'll assume you have it as a string.

data-gs-ta-val is interesting because it looks like the associated datum is a string representation of a Python dictionary.

Therefore:

  1. from bs4 import BeautifulSoup as BS
  2. from ast import literal_eval
  3. html = &quot;&quot;&quot;
  4. &lt;!DOCTYPE html&gt;
  5. &lt;html&gt;
  6. &lt;body&gt;
  7. &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Bangalore&#39;,&#39;value&#39;:&#39;Bangalore&#39;,&#39;CID&#39;:&#39;105&#39;,&#39;id&#39;:&#39;105&#39;,&#39;P&#39;:&#39;1&#39;,&#39;slug&#39;:&#39;bangaluru&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Bangalore&lt;/li&gt;
  8. &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Chennai&#39;,&#39;value&#39;:&#39;Chennai&#39;,&#39;CID&#39;:&#39;106&#39;,&#39;id&#39;:&#39;106&#39;,&#39;P&#39;:&#39;2&#39;,&#39;slug&#39;:&#39;madras&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Chennai&lt;/li&gt;
  9. &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Mumbai&#39;,&#39;value&#39;:&#39;Mumbai&#39;,&#39;CID&#39;:&#39;108&#39;,&#39;id&#39;:&#39;108&#39;,&#39;P&#39;:&#39;3&#39;,&#39;slug&#39;:&#39;bombay&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Mumbai&lt;/li&gt;
  10. &lt;/body&gt;
  11. &lt;/html&gt;
  12. &quot;&quot;&quot;
  13. soup = BS(html, &#39;lxml&#39;)
  14. for li in soup.find_all(&#39;li&#39;, class_=&#39;gs_ta_choice&#39;):
  15. d = literal_eval(li[&#39;data-gs-ta-val&#39;])
  16. print(d.get(&#39;slug&#39;, &#39;No slug here&#39;))

Output:

  1. bangaluru
  2. madras
  3. bombay

答案2

得分: 1

以下是您要翻译的代码部分:

  1. from bs4 import BeautifulSoup
  2. import json
  3. html_doc = """
  4. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Bangalore','value':'Bangalore','CID':'105','id':'105','P':'1','slug':'bangaluru'}" style="line-height: initial;"> Bangalore</li>
  5. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Chennai','value':'Chennai','CID':'106','id':'106','P':'2','slug':'madras'}" style="line-height: initial;"> Chennai</li>
  6. <li class="gs_ta_choice" data-value="Bangalore" data-gs-ta-val="{'text':'Mumbai','value':'Mumbai','CID':'108','id':'108','P':'3','slug':'bombay'}" style="line-height: initial;"> Mumbai</li>
  7. """
  8. soup = BeautifulSoup(html_doc, 'html.parser')
  9. for li in soup.find_all('li'):
  10. data = li.attrs['data-gs-ta-val'].replace("'", '"')
  11. data = json.loads(data)
  12. #print(data)
  13. print(data['slug'])

希望这对您有所帮助。

英文:
  1. from bs4 import BeautifulSoup
  2. import json
  3. html_doc = &quot;&quot;&quot;
  4. &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Bangalore&#39;,&#39;value&#39;:&#39;Bangalore&#39;,&#39;CID&#39;:&#39;105&#39;,&#39;id&#39;:&#39;105&#39;,&#39;P&#39;:&#39;1&#39;,&#39;slug&#39;:&#39;bangaluru&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Bangalore&lt;/li&gt;
  5. &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Chennai&#39;,&#39;value&#39;:&#39;Chennai&#39;,&#39;CID&#39;:&#39;106&#39;,&#39;id&#39;:&#39;106&#39;,&#39;P&#39;:&#39;2&#39;,&#39;slug&#39;:&#39;madras&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Chennai&lt;/li&gt;
  6. &lt;li class=&quot;gs_ta_choice&quot; data-value=&quot;Bangalore&quot; data-gs-ta-val=&quot;{&#39;text&#39;:&#39;Mumbai&#39;,&#39;value&#39;:&#39;Mumbai&#39;,&#39;CID&#39;:&#39;108&#39;,&#39;id&#39;:&#39;108&#39;,&#39;P&#39;:&#39;3&#39;,&#39;slug&#39;:&#39;bombay&#39;}&quot; style=&quot;line-height: initial;&quot;&gt; Mumbai&lt;/li&gt;
  7. &quot;&quot;&quot;
  8. soup = BeautifulSoup(html_doc, &#39;html.parser&#39;)
  9. for li in soup.find_all(&#39;li&#39;):
  10. data = li.attrs[&#39;data-gs-ta-val&#39;].replace(&quot;&#39;&quot;, &#39;&quot;&#39;)
  11. data = json.loads(data)
  12. #print(data)
  13. print(data[&#39;slug&#39;])

gives what you want

huangapple
  • 本文由 发表于 2023年7月4日 23:48:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76614213.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定