英文:
Parse text/html to application/json with restTemplate in java
问题
这是我的代码:
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED);
MultiValueMap<String, String> map = new LinkedMultiValueMap<String, String>();
map.add("xx", "xx");
HttpEntity<MultiValueMap<String, String>> request = new HttpEntity<>(map, headers);
ResponseEntity<String> response = new RestTemplate().postForEntity(url, request, String.class);
这是我的响应:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<body onLoad="xx">
<form action='xxx' method="post" name="aspForm" >
<input type="hidden" name="responseMessage" value='Successfully Registered'/>
<input type="hidden" name="url" value='xxxx'/>
<input type="hidden" name="status" value='SUCCESS'/>
</form>
</body>
</html>
如何将这些 HTML 响应中的名称和值对转换为 JSON?
英文:
Here is my code:
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED);
MultiValueMap<String, String> map= new LinkedMultiValueMap<String, String>();
map.add("xx","xx");
HttpEntity<MultiValueMap<String, String>> request = new HttpEntity<>(map, headers);
ResponseEntity<String> response = new RestTemplate().postForEntity( url, request ,String.class);
Here is my response:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<body onLoad="xx">
<form action='xxx' method="post" name="aspForm" >
<input type="hidden" name="responseMessage" value='Successfully Registered'/>
<input type="hidden" name="url" value='xxxx'/>
<input type="hidden" name="status" value='SUCCESS'/>
</form>
</body>
</html>
How to convert those name and value pairs from html response to JSON?
答案1
得分: 2
这可以通过使用Jsoup和Jackson对象映射来实现:
import com.fasterxml.jackson.databind.ObjectMapper;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
Document doc = Jsoup.parse(html);
String responseMessage = doc.body()
.getElementsByAttributeValue("name", "responseMessage")
.first()
.attributes()
.get("value");
String status = doc.body()
.getElementsByAttributeValue("name", "status")
.first()
.attributes()
.get("value");
String url = doc.body()
.getElementsByAttributeValue("name", "url")
.first()
.attributes()
.get("value");
Response response = new Response();
response.setResponseMessage(responseMessage);
response.setStatus(status);
response.setUrl(url);
ObjectMapper mapper = new ObjectMapper();
String json = mapper.writeValueAsString(response);
System.out.println(json);
输出:
{"responseMessage":"Successfully Registered","status":"SUCCESS","url":"xxxx"}
更新:
如果需要在不进行手动网页抓取的情况下转换HTML字符串,也是可能的(但我想这仅适用于XHTML,因为解析器将在非XML兼容标记上中断)。
POM依赖:
<dependency>
<groupId>org.eclipse.persistence</groupId>
<artifactId>org.eclipse.persistence.moxy</artifactId>
<version>2.5.2</version>
<type>jar</type>
</dependency>
Bean定义(省略getter/setter):
@XmlRootElement(name = "html")
@XmlAccessorType(XmlAccessType.FIELD)
public class Response {
@XmlPath("body/form/input[@name='url']/@value")
private String url;
@XmlPath("body/form/input[@name='status']/@value")
private String status;
@XmlPath("body/form/input[@name='responseMessage']/@value")
private String responseMessage;
}
创建消息转换器:
private static HttpMessageConverter<Object> createXmlHttpMessageConverter() throws JAXBException {
MarshallingHttpMessageConverter xmlConverter = new MarshallingHttpMessageConverter();
// 添加许多媒体类型,保留必要的类型
xmlConverter.setSupportedMediaTypes(Arrays.asList(
MediaType.APPLICATION_XML, MediaType.TEXT_HTML, MediaType.TEXT_PLAIN, MediaType.TEXT_XML
));
Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
jaxb2Marshaller.setClassesToBeBound(Response.class);
// 如果没有这个设置,jaxb会抱怨开头的doctype
jaxb2Marshaller.setSupportDtd(true);
xmlConverter.setMarshaller(jaxb2Marshaller);
xmlConverter.setUnmarshaller(jaxb2Marshaller);
return xmlConverter;
}
REST模板初始化:
RestTemplate rest = new RestTemplate();
rest.getMessageConverters().add(0, createXmlHttpMessageConverter());
您还需要将MOXy设置为JAXB提供程序。我在这段代码中使用了以下设置:
System.setProperty(JAXBContext.JAXB_CONTEXT_FACTORY, "org.eclipse.persistence.jaxb.JAXBContextFactory");
但也可以通过其他方式完成。
所有这些都将允许您执行调用:
Response response = rest.postForEntity(url, request, Response.class);
从Response实例中使用Jackson应该很容易检索JSON。
英文:
This can be achieved using Jsoup and Jackson object mapper:
import com.fasterxml.jackson.databind.ObjectMapper;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
Document doc = Jsoup.parse(html);
String responseMessage = doc.body()
.getElementsByAttributeValue("name", "responseMessage")
.first()
.attributes()
.get("value");
String status = doc.body()
.getElementsByAttributeValue("name", "status")
.first()
.attributes()
.get("value");
String url = doc.body()
.getElementsByAttributeValue("name", "url")
.first()
.attributes()
.get("value");
Response response = new Response();
response.setResponseMessage(responseMessage);
response.setStatus(status);
response.setUrl(url);
ObjectMapper mapper = new ObjectMapper();
String json = mapper.writeValueAsString(response);
System.out.println(json);
Output:
{"responseMessage":"Successfully Registered","status":"SUCCESS","url":"xxxx"}
Update:
If it's needed to convert HTML string without manual webscraping, it's also possible (but will work for XHTML only I suppose, because parser will break on non-XML-compliant markup).
POM dependency:
<dependency>
<groupId>org.eclipse.persistence</groupId>
<artifactId>org.eclipse.persistence.moxy</artifactId>
<version>2.5.2</version>
<type>jar</type>
</dependency>
Bean definition (getter/setter skipped):
@XmlRootElement(name = "html")
@XmlAccessorType(XmlAccessType.FIELD)
public class Response {
@XmlPath("body/form/input[@name='url']/@value")
private String url;
@XmlPath("body/form/input[@name='status']/@value")
private String status;
@XmlPath("body/form/input[@name='responseMessage']/@value")
private String responseMessage;
}
Create message converter:
private static HttpMessageConverter<Object> createXmlHttpMessageConverter() throws JAXBException {
MarshallingHttpMessageConverter xmlConverter = new MarshallingHttpMessageConverter();
// I added lot of mediatypes, leave necessary ones
xmlConverter.setSupportedMediaTypes(Arrays.asList(
MediaType.APPLICATION_XML, MediaType.TEXT_HTML, MediaType.TEXT_PLAIN, MediaType.TEXT_XML
));
Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
jaxb2Marshaller.setClassesToBeBound(Response.class);
// without this jaxb will complain about doctype in the beginning
jaxb2Marshaller.setSupportDtd(true);
xmlConverter.setMarshaller(jaxb2Marshaller);
xmlConverter.setUnmarshaller(jaxb2Marshaller);
return xmlConverter;
}
REST template initialization:
RestTemplate rest = new RestTemplate();
rest.getMessageConverters().add(0, createXmlHttpMessageConverter());
Also you'll need to set MOXy as JAXB provider. I used for this code
System.setProperty(JAXBContext.JAXB_CONTEXT_FACTORY, "org.eclipse.persistence.jaxb.JAXBContextFactory");
but it can be done also in other ways.
This all will allow you to perform call:
Response response = rest.postForEntity(url, request, Response.class);
And from Response instance it should be trivial to retrieve JSON using Jackson.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论