Jsoup – 如何检测严格相邻的元素 – 检查元素是否已被移除

huangapple go评论90阅读模式
英文:

Jsoup - How to detect strictly adjacent elements - check if element has been removed

问题

private String removeSimilarTags(String htmlContent) {
    org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);

    Elements highlightedSpanElements = doc.select("span.highlighted"); // 选择所有带有 highlight 类的 span 元素
    for (Element span : highlightedSpanElements) {
        Element beforeEl = span.previousElementSibling();
        if (span != null && !span.hasParent()) { // 需要另一个函数来验证元素是否已被删除
            beforeEl.after("<span class='" + HIGHLIGHT + "'>" + mergeAdjacentSpans(span) + "</span>");
        }
    }
    return doc.outerHtml();
}

private String mergeAdjacentSpans(Element span) {
    Element nextEl = span.nextElementSibling() != null ? span.nextElementSibling() : null;

    String text = span.text();
    if (nextEl != null && nextEl.tagName().equalsIgnoreCase(SPAN_TAG)
            && nextEl.classNames().contains(HIGHLIGHT)) {
        // 下一个元素也是带有 highlight 类的 span
        text = text.concat(" " + mergeAdjacentSpans(nextEl));
    }
    span.remove();
    return text;
}

以上是您提供的代码的翻译部分,如果您还有其他翻译需求,请随时提问。

英文:

I need to detect strictly adjacent elements with jsoup. For this I would use the example provided in https://stackoverflow.com/questions/21761478/how-to-detect-strictly-adjacent-siblings but I need a working example for Jsoup - java.

Input

&lt;div id=&quot;container&quot;&gt;
    &lt;span class=&quot;highlighted&quot;&gt;Paragraph 1&lt;/span&gt;
    &lt;span class=&quot;highlighted&quot;&gt;Paragraph 2&lt;/span&gt;
    This is just loose text.
    &lt;p class=&quot;highlighted&quot;&gt;Paragraph 3&lt;/p&gt;
&lt;/div&gt;

What I'm trying to accomplish is to build a single element with the text of all sibling similar elements.

private String removeSimilarTags(String htmlContent){
        org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);

        Elements highlightedSpanElements = doc.select(&quot;span.highlighted&quot;); //Selecting all spans with class highlight
        for(Element span : highlightedSpanElements){
            Element beforeEl = span.previousElementSibling();
            if(span != null) //I need another function to verify if element has been already removed{
                beforeEl.after(&quot;&lt;span class=&#39;&quot;+HIGHLIGHT+&quot;&#39;&gt;&quot;+mergeAdjacentSpans(span)+&quot;&lt;/span&gt;&quot;);
            }
        }
        return doc.outerHtml();
    }

 private String mergeAdjacentSpans(Element span){
        Element nextEl = span.nextElementSibling() != null ? span.nextElementSibling() : null;
       
        String text = span.text();
        if(nextEl != null &amp;&amp; nextEl.tagName().equalsIgnoreCase(SPAN_TAG)
                          &amp;&amp; nextEl.classNames().contains(HIGHLIGHT)){
            //Next Element is also  a highlighted span
           text =  text.concat(&quot; &quot;+ mergeAdjacentSpans(spanEl));
        }
        span.remove();
        return text;
    }

And also I would like to have some insights of how to verify an element has been already removed. I cannot find a clear answer online.

Thank you guys !

答案1

得分: 2

所以,要检测元素是否严格相邻,您应该了解 Jsoup 中 Node 和 Element 的区别。在我的情况下,我使用 Node,因为它包含了作为字符串或实际元素出现的后续所有元素,因此它不会对元素标签敏感。

private boolean isNexSiblingAdjacent(Element span){
  Node informationAfterNode = span.nextSibling();
  Element nextTaggedElement = span.nextElementSibling();
  return informationAfterNode.outerHtml().trim().length() == 0 ||
    informationAfterNode.outerHtml().equalsIgnoreCase(nextTaggedElement.outerHtml());
}

因此,我首先进行的条件检查是验证它是否只包含空格,但您也可以检查它是否以 &lt;!- 开头并以 -> 结尾,以判断是否是注释。因为这两个条件仍会使其保持相邻。最后,还要检查节点的 HTML 是否与元素中的 HTML 相似。

英文:

So for detecting if elements are strictly adjacent you should know the difference between Node and Element in Jsoup https://stackoverflow.com/questions/47881838/difference-between-jsoup-element-and-jsoup-node#:~:text=A%20node%20is%20the%20generic,Node . In my case I used Node because it contains whatever elements comes after being a string or an actual element, so it's not tagged element sensitive.

private boolean isNexSiblingAdjacent(Element span){
  Node informationAfterNode = span.nextSibling();
  Element nextTaggedElement = span.nextElementSibling();
  return informationAfterNode.outerHtml().trim().length() == 0 ||
 informationAfterNode.outerHtml().equalsIgnoreCase(nextTaggedElement.outerHtml());
}

So the first condition I do is to verify that it only has blank spaces inside but you can check if it starts with <!- and it ends with -> to check if it is a comment too. As these two conditions will make it still adjacent. And last but no least check if the html of the node is similar to the one in element.

huangapple
  • 本文由 发表于 2020年9月22日 15:37:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/64005116.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定