2017年2月10日 00:47:27go评论112阅读模式

英文:

GoLang - GoQuery HTML Insertion Fails

问题

我希望提取elementB，然后在elementC和elementD之前停止提取，也就是说，不提取elementC和elementD的.text内容。然而，我只知道如何提取整个div的文本，使用Contents().Not来忽略elementC，但是elementD仍然被提取了。

这是我目前正在使用的代码：

GoLang:

capturedText := s.Find("div").Contents().Not(".label").Text()

这个代码忽略了elementC，但是没有忽略没有外部标签的elementD。

HTML:

<li><span><h2>elementA</h2></span><div>elementB<br><span class="label">elementC</span>elementD</div></li>

我该如何只提取<div>中的elementB，而不包括elementC和elementD？

编辑：

我尝试了关闭div标签，像这样：

s.Find(".label").BeforeHtml(`</div>`)

还尝试了：

s.Find(".label").BeforeHtml(`</div><div>`)

并且尝试访问第一个div，忽略第二个div，第二个div现在应该有elementD：

jp, _ := s.Find("div").First().Html()

然而，这并没有起作用。似乎</div>不能是一个开放标签 - 它需要是<div>...</div>才能正确插入。但这不是我需要的，我只需要</div>或</div><div>来正确关闭第一个div。

请问应该如何修复这个问题？

英文:

I wish to extract elementB, and then stop before element C and D - i.e., do not extract the .text of content elementC and elementD. However, I only know how to extract the entire div text, using Contents().Not to ignore elementC, but elementD is still captured.

Here is the code I am currently using:

GoLang:

capturedText := s.Find(&quot;div&quot;).Contents().Not(&quot;.label&quot;).Text()

Which ignores elementC, but not elementD, which has no outer tags.

HTML:

&lt;li&gt;&lt;span&gt;&lt;h2&gt;elementA&lt;/h2&gt;&lt;/span&gt;&lt;div&gt;elementB&lt;br&gt;&lt;span class=&quot;label&quot;&gt;elementC&lt;/span&gt;elementD&lt;/div&gt;&lt;/li&gt;

How do I capture only elementB of <div>, and not elementC and elementD?

Edit:

I have tried closing the div tag like so:

s.Find(&quot;.label&quot;).BeforeHtml(`&lt;/div&gt;`)

and also tried:

s.Find(&quot;.label&quot;).BeforeHtml(`&lt;/div&gt;&lt;div&gt;`)

and accessing the first div, disregarding the second div which should now have elementD with:

jp, _ := s.Find(&quot;div&quot;).First().Html()

However, this is not working. It seems that </div> must not be an open tag - it needs to be <div>...</div> to insert correctly. But this is NOT what I need, I require ONLY </div> or </div><div> to close the first div correctly.

What is the appropriate way to fix this?

答案1

得分: 0

由于我无法编辑带有“broken”节点的HTML，所以我选择了以下方法：

s.Find(".label").BeforeHtml(|_SEPARATOR_|) // 在Html中插入文本分隔符
preCleanNode := s.Find("div").Contents().Not(".label").Text() // 将Html转换为文本
cleanNode := strings.Split(preCleanNode, |_SEPARATOR_|) // 根据文本分隔符拆分文本
outputString := cleanNode[0] // 输出我们想要的文本

英文:

Since I can't edit the HTML with a 'broken' node, I have opted for this:

	s.Find(&quot;.label&quot;).BeforeHtml(`|_SEPARATOR_|`) // Insert text separator into Html
	preCleanNode := s.Find(&quot;div&quot;).Contents().Not(&quot;.label&quot;).Text() //Get Html as Text
	cleanNode := strings.Split(preCleanNode, `|_SEPARATOR_|`) // Split text based on Text Separator
	outputString := cleanNode[0] // Output our wanted text

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

GoLang – GoQuery HTML插入失败

问题

答案1

Golang dynamic sizing slice when reading a file using buffo.read

如何解组数组内的对象

producer consumer in golang

无法找到导入项

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。