英文:
Jsoup - How to detect strictly adjacent elements - check if element has been removed
问题
private String removeSimilarTags(String htmlContent) {
org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);
Elements highlightedSpanElements = doc.select("span.highlighted"); // 选择所有带有 highlight 类的 span 元素
for (Element span : highlightedSpanElements) {
Element beforeEl = span.previousElementSibling();
if (span != null && !span.hasParent()) { // 需要另一个函数来验证元素是否已被删除
beforeEl.after("<span class='" + HIGHLIGHT + "'>" + mergeAdjacentSpans(span) + "</span>");
}
}
return doc.outerHtml();
}
private String mergeAdjacentSpans(Element span) {
Element nextEl = span.nextElementSibling() != null ? span.nextElementSibling() : null;
String text = span.text();
if (nextEl != null && nextEl.tagName().equalsIgnoreCase(SPAN_TAG)
&& nextEl.classNames().contains(HIGHLIGHT)) {
// 下一个元素也是带有 highlight 类的 span
text = text.concat(" " + mergeAdjacentSpans(nextEl));
}
span.remove();
return text;
}
以上是您提供的代码的翻译部分,如果您还有其他翻译需求,请随时提问。
英文:
I need to detect strictly adjacent elements with jsoup. For this I would use the example provided in https://stackoverflow.com/questions/21761478/how-to-detect-strictly-adjacent-siblings but I need a working example for Jsoup - java.
Input
<div id="container">
<span class="highlighted">Paragraph 1</span>
<span class="highlighted">Paragraph 2</span>
This is just loose text.
<p class="highlighted">Paragraph 3</p>
</div>
What I'm trying to accomplish is to build a single element with the text of all sibling similar elements.
private String removeSimilarTags(String htmlContent){
org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);
Elements highlightedSpanElements = doc.select("span.highlighted"); //Selecting all spans with class highlight
for(Element span : highlightedSpanElements){
Element beforeEl = span.previousElementSibling();
if(span != null) //I need another function to verify if element has been already removed{
beforeEl.after("<span class='"+HIGHLIGHT+"'>"+mergeAdjacentSpans(span)+"</span>");
}
}
return doc.outerHtml();
}
private String mergeAdjacentSpans(Element span){
Element nextEl = span.nextElementSibling() != null ? span.nextElementSibling() : null;
String text = span.text();
if(nextEl != null && nextEl.tagName().equalsIgnoreCase(SPAN_TAG)
&& nextEl.classNames().contains(HIGHLIGHT)){
//Next Element is also a highlighted span
text = text.concat(" "+ mergeAdjacentSpans(spanEl));
}
span.remove();
return text;
}
And also I would like to have some insights of how to verify an element has been already removed. I cannot find a clear answer online.
Thank you guys !
答案1
得分: 2
所以,要检测元素是否严格相邻,您应该了解 Jsoup 中 Node 和 Element 的区别。在我的情况下,我使用 Node,因为它包含了作为字符串或实际元素出现的后续所有元素,因此它不会对元素标签敏感。
private boolean isNexSiblingAdjacent(Element span){
Node informationAfterNode = span.nextSibling();
Element nextTaggedElement = span.nextElementSibling();
return informationAfterNode.outerHtml().trim().length() == 0 ||
informationAfterNode.outerHtml().equalsIgnoreCase(nextTaggedElement.outerHtml());
}
因此,我首先进行的条件检查是验证它是否只包含空格,但您也可以检查它是否以 <!-
开头并以 ->
结尾,以判断是否是注释。因为这两个条件仍会使其保持相邻。最后,还要检查节点的 HTML 是否与元素中的 HTML 相似。
英文:
So for detecting if elements are strictly adjacent you should know the difference between Node and Element in Jsoup https://stackoverflow.com/questions/47881838/difference-between-jsoup-element-and-jsoup-node#:~:text=A%20node%20is%20the%20generic,Node . In my case I used Node because it contains whatever elements comes after being a string or an actual element, so it's not tagged element sensitive.
private boolean isNexSiblingAdjacent(Element span){
Node informationAfterNode = span.nextSibling();
Element nextTaggedElement = span.nextElementSibling();
return informationAfterNode.outerHtml().trim().length() == 0 ||
informationAfterNode.outerHtml().equalsIgnoreCase(nextTaggedElement.outerHtml());
}
So the first condition I do is to verify that it only has blank spaces inside but you can check if it starts with <!- and it ends with -> to check if it is a comment too. As these two conditions will make it still adjacent. And last but no least check if the html of the node is similar to the one in element.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论