2023年7月7日 03:30:59go评论91阅读模式

英文:

htmlunit parse Html version 2.70

问题

我一直在尝试将htmlunit（https://www.htmlunit.org/）从版本2.27升级到版本2.70。我注意到静态方法HtmlParser.parseHtml()不再存在。

我理解现在我需要实例化一个HtmlUnitNekoHtmlParser。类似这样：

HTMLParser htmlParser = new HtmlUnitNekoHtmlParser();
HtmlPage htmlPage = new HtmlPage(tmpResponse, tmpWebWindow);
htmlParser.parse(tmpResponse, htmlPage, true, true);

然而，这会导致一个错误：

与页面关联的脚本对象不存在。类：'com.gargoylesoftware.htmlunit.html.HtmlPage'

根据此处的javadoc：

https://javadoc.io/doc/net.sourceforge.htmlunit/htmlunit/latest/index.html

布尔值指示我们是否需要使用XHtml解析器以及脚本是否由JavaScript创建。

我尝试过以下组合：

htmlParser.parse(tmpResponse, htmlPage, false, true)
仍然显示消息：与页面关联的脚本对象不存在
htmlParser.parse(tmpResponse, htmlPage, false, false)
与页面关联的脚本对象不存在
htmlParser.parse(tmpResponse, htmlPage, true, false)
与页面关联的脚本对象不存在

在这个htmlunit的新版本中，用什么正确的方式来替代旧的HtmlParser.parseHtml()语句呢？

英文:

I have been trying to upgrade htmlunit (https://www.htmlunit.org/) to version 2.70 from version 2.27. I noticed that the static method HtmlParser.parseHtml() no longer exists.
I understand I have to instantiate a HtmlUnitNekoHtmlParser now. Something like this:

HTMLParser htmlParser = new HtmlUnitNekoHtmlParser();
HtmlPage htmlPage =new HtmlPage(tmpResponse,tmpWebWindow);
htmlParser.parse(tmpResponse, htmlPage, true, true);

However this leads to an error:

No script object associated with the Page. class: &#39;com.gargoylesoftware.htmlunit.html.HtmlPage&#39;

According to the javadoc here:

https://javadoc.io/doc/net.sourceforge.htmlunit/htmlunit/latest/index.html

The booleans indicate whether we need to use the XHtml parser and if the script was created by javascript.

I have tried the following combinations:

htmlParser.parse(tmpResponse, htmlPage, false, true)
Still the message No script object associated with the Page
htmlParser.parse(tmpResponse, htmlPage, false, false)
No script object associated with the Page
htmlParser.parse(tmpResponse, htmlPage, true, false)
No script object associated with the Page

What would be the correct way to replace the old HtmlParser.parseHtml() statements in this new version of htmlunit?

答案1

得分: 1

选项 1： 您只需喜欢解析字符串内容（请参阅https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString）

您可以这样做...

try (WebClient webClient = new WebClient(browserVersion)) {
    final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
    // 与 HTML 页面一起使用
}

选项 2： 艰难的方式（通常您必须执行选项 1 的实现所做的操作）

final HTMLParser htmlParser = webClient.getPageCreator().getHtmlParser();
final WebWindow webWindow = webClient.getCurrentWindow();
final HtmlPage page = new HtmlPage(webResponse, webWindow);
webWindow.setEnclosedPage(page);
htmlParser.parse(webResponse, page, false, false);

希望有所帮助。

英文:

Oh, 2.27 to 2.70 is a huge step.

Option 1: you simply like to parse string content (see https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString)

You can do it like this...

try (WebClient webClient = new WebClient(browserVersion)) {
    final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
    // work with the html page
}

Option 2: the hard way (in general you have to do what the impl from option 1 does)

final HTMLParser htmlParser = webClient.getPageCreator().getHtmlParser();
final WebWindow webWindow = webClient.getCurrentWindow();
final HtmlPage page = new HtmlPage(webResponse, webWindow);
webWindow.setEnclosedPage(page);
htmlParser.parse(webResponse, page, false, false);

Hope that helps

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

htmlunit解析HTML版本2.70

问题

答案1

how to create user defined struct for array of different user defined object in golang

Go – package ast：在文件中查找包

如何使用Jsoup从HTML内容中获取CSS样式属性。

解析类似Lisp的字符串为标记和文字。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。