2023年7月14日 04:00:27go评论71阅读模式

英文:

extracting a piece of text from HTML

问题

需要能够使用单个CSS选择器提取片段一，而不捕获已禁用的片段二。以下是您尝试过的内容：

button > div:first-child:not(:has(button > div + div.disabled))

button > div:not(:has(div.disabled))

button > div:not(.disabled)

button > div[data-v-^]

这是一个可重现的示例：

from selectolax.parser import HTMLParser

html = """
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""

html = HTMLParser(html)

希望对您有所帮助。对于此示例和支持，我不介意使用Beautiful Soup 4（bs4）库。

英文:

I need to be able to extract snippet one, without capturing snippet two, which is disabled, using a single CSS selector.

snippet one
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;

snippet two
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;

I am drawing a blank, I have tried the following:

html.css(&quot;button &gt; div:first-child:not(:has(button &gt; div + div.disabled))&quot;)

html.css(&quot;button &gt; div:not(:has(div.disabled))&quot;)

html.css(&quot;button &gt; div:not(.disabled)&quot;)

html.css(&quot;button &gt; div[data-v-^]&quot;)

here is a reproducible example

from selectolax.parser import HTMLParser

html = &quot;&quot;&quot;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;

html = HTMLParser(html)
&quot;&quot;&quot;

Any help would be most appreciated. FYI I used selectolax for parsing however for this example and support I do not mind a bs4 flavour.

答案1

得分: 2

尝试：

from bs4 import BeautifulSoup

html = """
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;
"""

soup = BeautifulSoup(html, 'html.parser')

button = soup.select('button:not(:has(.disabled))')
print(button)

打印：

[<button data-v-4e0029d1="">Small</button>]

英文:

Try:

from bs4 import BeautifulSoup

html = &quot;&quot;&quot;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;
&quot;&quot;&quot;

soup = BeautifulSoup(html, &#39;html.parser&#39;)

button = soup.select(&#39;button:not(:has(.disabled))&#39;)
print(button)

Prints:

[&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;]

答案2

得分: 1

你可以在数据中使用否定检查。这里使用了BeautifulSoup。

from bs4 import BeautifulSoup

html = """
<button data-v-4e0029d1=""></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>
<button data-v-4e0029d1=""><div data-v-4e0029d1="">Large</div><div class="disabled" data-v-4e0029d1="">disabled</div></button>
"""

enabled_buttons = 
print(enabled_buttons)

输出：

[<button data-v-4e0029d1=""></button>, <button data-v-4e0029d1=""><div data-v-4e0029d1="">Small</div></button>]

英文:

You can use a negative check with the data. Used BeautifulSoup here though.

from bs4 import BeautifulSoup

html = &quot;&quot;&quot;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;
&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Large&lt;/div&gt;&lt;div class=&quot;disabled&quot; data-v-4e0029d1=&quot;&quot;&gt;disabled&lt;/div&gt;&lt;/button&gt;
&quot;&quot;&quot;

enabled_buttons = 
print(enabled_buttons)

Output:

[&lt;button data-v-4e0029d1=&quot;&quot;&gt;&lt;div data-v-4e0029d1=&quot;&quot;&gt;Small&lt;/div&gt;&lt;/button&gt;]

答案3

得分: 0

为了记录，我成功解决了它，使用以下方法：

button &gt; div:not(.disabled):not(:has(+ .disabled))

英文:

For posterity, I managed to solve it using:

button &gt; div:not(.disabled):not(:has(+ .disabled))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从HTML中提取一段文本。

问题

答案1

答案2

答案3

Connecting and Authenticating to Delta Lake on Azure Data Lake Storage Gen 2 using delta-rs Python API

如何有效地使用pandas根据唯一标识符和一个条件来更改值？

TypeError: can't multiply sequence by non-int of type 'str' i get this when i want to multiplied my name by my name in python

使用lambda（Python）从S3读取CSV，并通过API Gateway将CSV返回给客户端。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论