使用Java中的RestTemplate将text/html解析为application/json。

huangapple go评论77阅读模式
英文:

Parse text/html to application/json with restTemplate in java

问题

这是我的代码:

            HttpHeaders headers = new HttpHeaders();
			headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED);
			MultiValueMap<String, String> map = new LinkedMultiValueMap<String, String>();
            map.add("xx", "xx");
            HttpEntity<MultiValueMap<String, String>> request = new HttpEntity<>(map, headers);
			ResponseEntity<String> response = new RestTemplate().postForEntity(url, request, String.class);

这是我的响应:

     <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
    "http://www.w3.org/TR/html4/loose.dtd">
     <html>
     <body onLoad="xx">
     <form action='xxx' method="post" name="aspForm" >
     <input type="hidden" name="responseMessage" value='Successfully Registered'/>
     <input type="hidden" name="url" value='xxxx'/>
     <input type="hidden" name="status" value='SUCCESS'/>
    </form>
    </body>
    </html>

如何将这些 HTML 响应中的名称和值对转换为 JSON?

英文:

Here is my code:

        HttpHeaders headers = new HttpHeaders();
		headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED);
		MultiValueMap&lt;String, String&gt; map= new LinkedMultiValueMap&lt;String, String&gt;();
        map.add(&quot;xx&quot;,&quot;xx&quot;);
        HttpEntity&lt;MultiValueMap&lt;String, String&gt;&gt; request = new HttpEntity&lt;&gt;(map, headers);
		ResponseEntity&lt;String&gt; response = new RestTemplate().postForEntity( url, request ,String.class);

Here is my response:

 &lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD HTML 4.01 Transitional//EN&quot; 
&quot;http://www.w3.org/TR/html4/loose.dtd&quot;&gt;
 &lt;html&gt;
 &lt;body onLoad=&quot;xx&quot;&gt;
 &lt;form action=&#39;xxx&#39; method=&quot;post&quot; name=&quot;aspForm&quot; &gt;
 &lt;input type=&quot;hidden&quot; name=&quot;responseMessage&quot; value=&#39;Successfully Registered&#39;/&gt;
 &lt;input type=&quot;hidden&quot; name=&quot;url&quot; value=&#39;xxxx&#39;/&gt;
 &lt;input type=&quot;hidden&quot; name=&quot;status&quot; value=&#39;SUCCESS&#39;/&gt;
&lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;

How to convert those name and value pairs from html response to JSON?

答案1

得分: 2

这可以通过使用Jsoup和Jackson对象映射来实现:

import com.fasterxml.jackson.databind.ObjectMapper;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

Document doc = Jsoup.parse(html);

String responseMessage = doc.body()
        .getElementsByAttributeValue("name", "responseMessage")
        .first()
        .attributes()
        .get("value");

String status = doc.body()
        .getElementsByAttributeValue("name", "status")
        .first()
        .attributes()
        .get("value");

String url = doc.body()
        .getElementsByAttributeValue("name", "url")
        .first()
        .attributes()
        .get("value");

Response response = new Response();
response.setResponseMessage(responseMessage);
response.setStatus(status);
response.setUrl(url);

ObjectMapper mapper = new ObjectMapper();

String json = mapper.writeValueAsString(response);

System.out.println(json);

输出:

{"responseMessage":"Successfully Registered","status":"SUCCESS","url":"xxxx"}

更新:

如果需要在不进行手动网页抓取的情况下转换HTML字符串,也是可能的(但我想这仅适用于XHTML,因为解析器将在非XML兼容标记上中断)。

POM依赖:

<dependency>
    <groupId>org.eclipse.persistence</groupId>
    <artifactId>org.eclipse.persistence.moxy</artifactId>
    <version>2.5.2</version>
    <type>jar</type>
</dependency>

Bean定义(省略getter/setter):

@XmlRootElement(name = "html")
@XmlAccessorType(XmlAccessType.FIELD)
public class Response {

    @XmlPath("body/form/input[@name='url']/@value")
    private String url;

    @XmlPath("body/form/input[@name='status']/@value")
    private String status;

    @XmlPath("body/form/input[@name='responseMessage']/@value")
    private String responseMessage;

}

创建消息转换器:

private static HttpMessageConverter<Object> createXmlHttpMessageConverter() throws JAXBException {
    MarshallingHttpMessageConverter xmlConverter = new MarshallingHttpMessageConverter();
    // 添加许多媒体类型,保留必要的类型
    xmlConverter.setSupportedMediaTypes(Arrays.asList(
            MediaType.APPLICATION_XML, MediaType.TEXT_HTML, MediaType.TEXT_PLAIN, MediaType.TEXT_XML
    ));
    Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
    jaxb2Marshaller.setClassesToBeBound(Response.class);
    // 如果没有这个设置,jaxb会抱怨开头的doctype
    jaxb2Marshaller.setSupportDtd(true);
    xmlConverter.setMarshaller(jaxb2Marshaller);
    xmlConverter.setUnmarshaller(jaxb2Marshaller);
    return xmlConverter;
}

REST模板初始化:

RestTemplate rest = new RestTemplate();
rest.getMessageConverters().add(0, createXmlHttpMessageConverter());

您还需要将MOXy设置为JAXB提供程序。我在这段代码中使用了以下设置:

System.setProperty(JAXBContext.JAXB_CONTEXT_FACTORY, "org.eclipse.persistence.jaxb.JAXBContextFactory");

但也可以通过其他方式完成。

所有这些都将允许您执行调用:

Response response = rest.postForEntity(url, request, Response.class);

从Response实例中使用Jackson应该很容易检索JSON。

英文:

This can be achieved using Jsoup and Jackson object mapper:

import com.fasterxml.jackson.databind.ObjectMapper;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

Document doc = Jsoup.parse(html);
    
    String responseMessage = doc.body()
            .getElementsByAttributeValue(&quot;name&quot;, &quot;responseMessage&quot;)
            .first()
            .attributes()
            .get(&quot;value&quot;);
    
    String status = doc.body()
            .getElementsByAttributeValue(&quot;name&quot;, &quot;status&quot;)
            .first()
            .attributes()
            .get(&quot;value&quot;);
    
    String url = doc.body()
            .getElementsByAttributeValue(&quot;name&quot;, &quot;url&quot;)
            .first()
            .attributes()
            .get(&quot;value&quot;);
    
    Response response = new Response();
    response.setResponseMessage(responseMessage);
    response.setStatus(status);
    response.setUrl(url);
    
    ObjectMapper mapper = new ObjectMapper();
    
    String json = mapper.writeValueAsString(response);
    
    System.out.println(json);

Output:

{&quot;responseMessage&quot;:&quot;Successfully Registered&quot;,&quot;status&quot;:&quot;SUCCESS&quot;,&quot;url&quot;:&quot;xxxx&quot;}

Update:

If it's needed to convert HTML string without manual webscraping, it's also possible (but will work for XHTML only I suppose, because parser will break on non-XML-compliant markup).

POM dependency:

&lt;dependency&gt;
    &lt;groupId&gt;org.eclipse.persistence&lt;/groupId&gt;
    &lt;artifactId&gt;org.eclipse.persistence.moxy&lt;/artifactId&gt;
    &lt;version&gt;2.5.2&lt;/version&gt;
    &lt;type&gt;jar&lt;/type&gt;
 &lt;/dependency&gt;

Bean definition (getter/setter skipped):

@XmlRootElement(name = &quot;html&quot;)
@XmlAccessorType(XmlAccessType.FIELD)
public class Response {

    @XmlPath(&quot;body/form/input[@name=&#39;url&#39;]/@value&quot;)
    private String url;

    @XmlPath(&quot;body/form/input[@name=&#39;status&#39;]/@value&quot;)
    private String status;

    @XmlPath(&quot;body/form/input[@name=&#39;responseMessage&#39;]/@value&quot;)
    private String responseMessage;

}

Create message converter:

private static HttpMessageConverter&lt;Object&gt; createXmlHttpMessageConverter() throws JAXBException {
    MarshallingHttpMessageConverter xmlConverter = new MarshallingHttpMessageConverter();
   // I added lot of mediatypes, leave necessary ones
    xmlConverter.setSupportedMediaTypes(Arrays.asList(
            MediaType.APPLICATION_XML, MediaType.TEXT_HTML, MediaType.TEXT_PLAIN, MediaType.TEXT_XML
    ));
    Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
    jaxb2Marshaller.setClassesToBeBound(Response.class);
    // without this jaxb will complain about doctype in the beginning
    jaxb2Marshaller.setSupportDtd(true);
    xmlConverter.setMarshaller(jaxb2Marshaller);
    xmlConverter.setUnmarshaller(jaxb2Marshaller);
    return xmlConverter;
}

REST template initialization:

RestTemplate rest = new RestTemplate();
rest.getMessageConverters().add(0, createXmlHttpMessageConverter());

Also you'll need to set MOXy as JAXB provider. I used for this code

System.setProperty(JAXBContext.JAXB_CONTEXT_FACTORY, &quot;org.eclipse.persistence.jaxb.JAXBContextFactory&quot;);

but it can be done also in other ways.

This all will allow you to perform call:

Response response = rest.postForEntity(url, request, Response.class);

And from Response instance it should be trivial to retrieve JSON using Jackson.

huangapple
  • 本文由 发表于 2020年10月8日 20:22:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/64262423.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定