htmlunit解析HTML版本2.70

huangapple go评论73阅读模式
英文:

htmlunit parse Html version 2.70

问题

我一直在尝试将htmlunit(https://www.htmlunit.org/)从版本2.27升级到版本2.70。我注意到静态方法HtmlParser.parseHtml()不再存在。

我理解现在我需要实例化一个HtmlUnitNekoHtmlParser。类似这样:

HTMLParser htmlParser = new HtmlUnitNekoHtmlParser();
HtmlPage htmlPage = new HtmlPage(tmpResponse, tmpWebWindow);
htmlParser.parse(tmpResponse, htmlPage, true, true);

然而,这会导致一个错误:

与页面关联的脚本对象不存在:'com.gargoylesoftware.htmlunit.html.HtmlPage'

根据此处的javadoc:

https://javadoc.io/doc/net.sourceforge.htmlunit/htmlunit/latest/index.html

布尔值指示我们是否需要使用XHtml解析器以及脚本是否由JavaScript创建。

我尝试过以下组合:

htmlParser.parse(tmpResponse, htmlPage, false, true)
仍然显示消息与页面关联的脚本对象不存在

htmlParser.parse(tmpResponse, htmlPage, false, false)
与页面关联的脚本对象不存在

htmlParser.parse(tmpResponse, htmlPage, true, false)
与页面关联的脚本对象不存在

在这个htmlunit的新版本中,用什么正确的方式来替代旧的HtmlParser.parseHtml()语句呢?

英文:

I have been trying to upgrade htmlunit (https://www.htmlunit.org/) to version 2.70 from version 2.27. I noticed that the static method HtmlParser.parseHtml() no longer exists.
I understand I have to instantiate a HtmlUnitNekoHtmlParser now. Something like this:

HTMLParser htmlParser = new HtmlUnitNekoHtmlParser();
HtmlPage htmlPage =new HtmlPage(tmpResponse,tmpWebWindow);
htmlParser.parse(tmpResponse, htmlPage, true, true);

However this leads to an error:

No script object associated with the Page. class: 'com.gargoylesoftware.htmlunit.html.HtmlPage'

According to the javadoc here:

https://javadoc.io/doc/net.sourceforge.htmlunit/htmlunit/latest/index.html

The booleans indicate whether we need to use the XHtml parser and if the script was created by javascript.

I have tried the following combinations:

htmlParser.parse(tmpResponse, htmlPage, false, true)
Still the message No script object associated with the Page

htmlParser.parse(tmpResponse, htmlPage, false, false)
No script object associated with the Page

htmlParser.parse(tmpResponse, htmlPage, true, false)
No script object associated with the Page

What would be the correct way to replace the old HtmlParser.parseHtml() statements in this new version of htmlunit?

答案1

得分: 1

选项 1: 您只需喜欢解析字符串内容(请参阅https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString)

您可以这样做...

try (WebClient webClient = new WebClient(browserVersion)) {
    final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
    // 与 HTML 页面一起使用
}

选项 2: 艰难的方式(通常您必须执行选项 1 的实现所做的操作)

final HTMLParser htmlParser = webClient.getPageCreator().getHtmlParser();
final WebWindow webWindow = webClient.getCurrentWindow();

final HtmlPage page = new HtmlPage(webResponse, webWindow);
webWindow.setEnclosedPage(page);

htmlParser.parse(webResponse, page, false, false);

希望有所帮助。

英文:

Oh, 2.27 to 2.70 is a huge step.

Option 1: you simply like to parse string content (see https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString)

You can do it like this...

try (WebClient webClient = new WebClient(browserVersion)) {
    final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
    // work with the html page
}

Option 2: the hard way (in general you have to do what the impl from option 1 does)

final HTMLParser htmlParser = webClient.getPageCreator().getHtmlParser();
final WebWindow webWindow = webClient.getCurrentWindow();

final HtmlPage page = new HtmlPage(webResponse, webWindow);
webWindow.setEnclosedPage(page);

htmlParser.parse(webResponse, page, false, false);

Hope that helps

huangapple
  • 本文由 发表于 2023年7月7日 03:30:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76632022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定