英文:
extracting a piece of text from HTML
问题
需要能够使用单个CSS选择器提取片段一,而不捕获已禁用的片段二。以下是您尝试过的内容:
button > div:first-child:not(:has(button > div + div.disabled))
button > div:not(:has(div.disabled))
button > div:not(.disabled)
button > div[data-v-^]
这是一个可重现的示例:
from selectolax.parser import HTMLParser
html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""
html = HTMLParser(html)
希望对您有所帮助。对于此示例和支持,我不介意使用Beautiful Soup 4(bs4)库。
英文:
I need to be able to extract snippet one, without capturing snippet two, which is disabled, using a single CSS selector.
snippet one
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
snippet two
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
I am drawing a blank, I have tried the following:
html.css("button > div:first-child:not(:has(button > div + div.disabled))")
html.css("button > div:not(:has(div.disabled))")
html.css("button > div:not(.disabled)")
html.css("button > div[data-v-^]")
here is a reproducible example
from selectolax.parser import HTMLParser
html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
html = HTMLParser(html)
"""
Any help would be most appreciated. FYI I used selectolax for parsing however for this example and support I do not mind a bs4 flavour.
答案1
得分: 2
尝试:
from bs4 import BeautifulSoup
html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""
soup = BeautifulSoup(html, 'html.parser')
button = soup.select('button:not(:has(.disabled))')
print(button)
打印:
[<button data-v-4e0029d1="">Small</button>]
英文:
Try:
from bs4 import BeautifulSoup
html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""
soup = BeautifulSoup(html, 'html.parser')
button = soup.select('button:not(:has(.disabled))')
print(button)
Prints:
[<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>]
答案2
得分: 1
你可以在数据中使用否定检查。这里使用了BeautifulSoup。
from bs4 import BeautifulSoup
html = """
<button data-v-4e0029d1=""></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""
enabled_buttons =
print(enabled_buttons)
输出:
[<button data-v-4e0029d1=""></button>, <button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>]
英文:
You can use a negative check with the data. Used BeautifulSoup here though.
from bs4 import BeautifulSoup
html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""
enabled_buttons =
print(enabled_buttons)
Output:
[<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>]
答案3
得分: 0
为了记录,我成功解决了它,使用以下方法:
button > div:not(.disabled):not(:has(+ .disabled))
英文:
For posterity, I managed to solve it using:
button > div:not(.disabled):not(:has(+ .disabled))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论