从HTML中提取一段文本。

huangapple go评论71阅读模式
英文:

extracting a piece of text from HTML

问题

需要能够使用单个CSS选择器提取片段一,而不捕获已禁用的片段二。以下是您尝试过的内容:

button > div:first-child:not(:has(button > div + div.disabled))
button > div:not(:has(div.disabled))
button > div:not(.disabled)
button > div[data-v-^]

这是一个可重现的示例:

from selectolax.parser import HTMLParser

html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""

html = HTMLParser(html)

希望对您有所帮助。对于此示例和支持,我不介意使用Beautiful Soup 4(bs4)库。

英文:

I need to be able to extract snippet one, without capturing snippet two, which is disabled, using a single CSS selector.

snippet one
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;

snippet two
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;

I am drawing a blank, I have tried the following:

html.css(&quot;button &gt; div:first-child:not(:has(button &gt; div + div.disabled))&quot;)
html.css(&quot;button &gt; div:not(:has(div.disabled))&quot;)
html.css(&quot;button &gt; div:not(.disabled)&quot;)
html.css(&quot;button &gt; div[data-v-^]&quot;)

here is a reproducible example

from selectolax.parser import HTMLParser

html = &quot;&quot;&quot;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;

html = HTMLParser(html)
&quot;&quot;&quot;

Any help would be most appreciated. FYI I used selectolax for parsing however for this example and support I do not mind a bs4 flavour.

答案1

得分: 2

尝试:

from bs4 import BeautifulSoup

html = """
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;
"""

soup = BeautifulSoup(html, 'html.parser')

button = soup.select('button:not(:has(.disabled))')
print(button)

打印:

[<button data-v-4e0029d1="">Small</button>]
英文:

Try:

from bs4 import BeautifulSoup

html = &quot;&quot;&quot;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;
&quot;&quot;&quot;

soup = BeautifulSoup(html, &#39;html.parser&#39;)

button = soup.select(&#39;button:not(:has(.disabled))&#39;)
print(button)

Prints:

[&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;]

答案2

得分: 1

你可以在数据中使用否定检查。这里使用了BeautifulSoup。

from bs4 import BeautifulSoup

html = """
<button data-v-4e0029d1=""></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""

enabled_buttons = 
print(enabled_buttons)

输出:

[<button data-v-4e0029d1=""></button>, <button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>]
英文:

You can use a negative check with the data. Used BeautifulSoup here though.

from bs4 import BeautifulSoup

html = &quot;&quot;&quot;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;
&quot;&quot;&quot;

enabled_buttons = 
print(enabled_buttons)

Output:

[&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;]

答案3

得分: 0

为了记录,我成功解决了它,使用以下方法:

button &gt; div:not(.disabled):not(:has(+ .disabled))
英文:

For posterity, I managed to solve it using:

button &gt; div:not(.disabled):not(:has(+ .disabled))

huangapple
  • 本文由 发表于 2023年7月14日 04:00:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76682859.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定