英文:
Parse HTMl using JSOUP - Need specific pattern
问题
我正在尝试获取标签之间的文本并保存到某个变量中,例如:
在这里,我想保存<em>
标签之间的值return
。同时,我需要保存在<p>
标签中的其余文本,
<em>
标签的值被赋予了return
,
<p>
标签的值应该返回只有--> an item, cancel an order, print a receipt, track your purchases or reorder items.
如果在<em>
标签之前有一些值,那么即使这个值也应该保存在不同的变量中,基本上一个<p>
如果它内部有多个标签,那么应该被拆分并保存到不同的变量中。如果我知道如何获取不在内部标签中的其余文本,我就可以检索其余的文本。
我已经写了以下代码:下面的代码只返回在<em>
标签中的"return"。
在这里,ep
基本上是doc.select(p)
,选择了<p>
标签,然后进行迭代,不确定我是否在正确的方式下操作,非常感谢任何其他方法。
String text = "<p><em>return </em>an item, cancel an order, print a receipt, track your purchases or reorder items.</p>";
Elements italic_tags = ep.select("em");
for (Element em : italic_tags) {
if (em.tagName().equals("em")) {
System.out.println(em.select("em").text());
}
}
请注意,你提供的代码片段中的HTML标记似乎不是标准的HTML格式,因此在实际使用时可能需要进行调整。如果需要更多帮助,请提供更多上下文信息。
英文:
I am trying to get text between tags and save into some variable, for example:
Here I want to save value return
which is between em
tags. Also I need the rest of the text which is in p
tags,
em
tag value is assigned with return
and
p
tag value should return only --> an item, cancel an order, print a receipt, track your purchases or reorder items.
if some value is before em
tag, even that value should be in different variable basically one p
if it has multiple tags within then it should be split and save into different variables. If I know how can I get rest of text which are not in inner tags I can retrieve the rest.
I have written below: the below is returning just "return" which is in "'em' tags.
Here ep
is basically doc.select(p)
, selecting p
tag and then iterating, not sure if I am doing right way, any other approaches are highly appreciated.
String text ="\<p><em>return </em>an item, cancel an order, print a receipt, track your purchases or reorder items.</p>"
Elements italic_tags = ep.select("em");
for(Element em:italic_tags) {
if(em.tagName().equals("em")) {
System.out.println( em.select("em").text());
}
}
答案1
得分: 0
Sure, here is the translated code portion you provided:
如果您需要选择不同标签包裹的每个子文本和文本,您需要尝试选择 `Node` 而不是 `Element`。我修改了您的 HTML 以包含更多的标签,以便示例更完整:
String text = "<p><em>return </em>an item, <em>cancel</em> an order, <em>print</em> a receipt, <em>track</em> your purchases or reorder items.</p>";
Document doc = Jsoup.parse(text);
Element ep = doc.selectFirst("p");
List<Node> childNodes = ep.childNodes();
for (Node node : childNodes) {
if (node instanceof TextNode) {
// 如果是文本,只显示它
System.out.println(node);
} else {
// 如果是另一个元素,则显示其第一个子元素,这在本例中是文本
System.out.println(node.childNode(0));
}
}
output:
return
an item,
cancel
an order,
print
a receipt,
track
your purchases or reorder items.
英文:
If you need to select each sub text and text enclosed by different tags you need to try selecting Node
instead of Element
. I modified your HTML to include more tags so the example is more complete:
String text = "<p><em>return </em>an item, <em>cancel</em> an order, <em>print</em> a receipt, <em>track</em> your purchases or reorder items.</p>";
Document doc = Jsoup.parse(text);
Element ep = doc.selectFirst("p");
List<Node> childNodes = ep.childNodes();
for (Node node : childNodes) {
if (node instanceof TextNode) {
// if it's a text, just display it
System.out.println(node);
} else {
// if it's another element, then display its first
// child which in this case is a text
System.out.println(node.childNode(0));
}
}
output:
return
an item,
cancel
an order,
print
a receipt,
track
your purchases or reorder items.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论