英文:
htmlunit parse Html version 2.70
问题
我一直在尝试将htmlunit(https://www.htmlunit.org/)从版本2.27升级到版本2.70。我注意到静态方法HtmlParser.parseHtml()不再存在。
我理解现在我需要实例化一个HtmlUnitNekoHtmlParser。类似这样:
HTMLParser htmlParser = new HtmlUnitNekoHtmlParser();
HtmlPage htmlPage = new HtmlPage(tmpResponse, tmpWebWindow);
htmlParser.parse(tmpResponse, htmlPage, true, true);
然而,这会导致一个错误:
与页面关联的脚本对象不存在。类:'com.gargoylesoftware.htmlunit.html.HtmlPage'
根据此处的javadoc:
https://javadoc.io/doc/net.sourceforge.htmlunit/htmlunit/latest/index.html
布尔值指示我们是否需要使用XHtml解析器以及脚本是否由JavaScript创建。
我尝试过以下组合:
htmlParser.parse(tmpResponse, htmlPage, false, true)
仍然显示消息:与页面关联的脚本对象不存在
htmlParser.parse(tmpResponse, htmlPage, false, false)
与页面关联的脚本对象不存在
htmlParser.parse(tmpResponse, htmlPage, true, false)
与页面关联的脚本对象不存在
在这个htmlunit的新版本中,用什么正确的方式来替代旧的HtmlParser.parseHtml()语句呢?
英文:
I have been trying to upgrade htmlunit (https://www.htmlunit.org/) to version 2.70 from version 2.27. I noticed that the static method HtmlParser.parseHtml() no longer exists.
I understand I have to instantiate a HtmlUnitNekoHtmlParser now. Something like this:
HTMLParser htmlParser = new HtmlUnitNekoHtmlParser();
HtmlPage htmlPage =new HtmlPage(tmpResponse,tmpWebWindow);
htmlParser.parse(tmpResponse, htmlPage, true, true);
However this leads to an error:
No script object associated with the Page. class: 'com.gargoylesoftware.htmlunit.html.HtmlPage'
According to the javadoc here:
https://javadoc.io/doc/net.sourceforge.htmlunit/htmlunit/latest/index.html
The booleans indicate whether we need to use the XHtml parser and if the script was created by javascript.
I have tried the following combinations:
htmlParser.parse(tmpResponse, htmlPage, false, true)
Still the message No script object associated with the Page
htmlParser.parse(tmpResponse, htmlPage, false, false)
No script object associated with the Page
htmlParser.parse(tmpResponse, htmlPage, true, false)
No script object associated with the Page
What would be the correct way to replace the old HtmlParser.parseHtml() statements in this new version of htmlunit?
答案1
得分: 1
选项 1: 您只需喜欢解析字符串内容(请参阅https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString)
您可以这样做...
try (WebClient webClient = new WebClient(browserVersion)) {
final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
// 与 HTML 页面一起使用
}
选项 2: 艰难的方式(通常您必须执行选项 1 的实现所做的操作)
final HTMLParser htmlParser = webClient.getPageCreator().getHtmlParser();
final WebWindow webWindow = webClient.getCurrentWindow();
final HtmlPage page = new HtmlPage(webResponse, webWindow);
webWindow.setEnclosedPage(page);
htmlParser.parse(webResponse, page, false, false);
希望有所帮助。
英文:
Oh, 2.27 to 2.70 is a huge step.
Option 1: you simply like to parse string content (see https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString)
You can do it like this...
try (WebClient webClient = new WebClient(browserVersion)) {
final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
// work with the html page
}
Option 2: the hard way (in general you have to do what the impl from option 1 does)
final HTMLParser htmlParser = webClient.getPageCreator().getHtmlParser();
final WebWindow webWindow = webClient.getCurrentWindow();
final HtmlPage page = new HtmlPage(webResponse, webWindow);
webWindow.setEnclosedPage(page);
htmlParser.parse(webResponse, page, false, false);
Hope that helps
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论