解析 HTML 字符串以提取值

huangapple go评论75阅读模式
英文:

Parse HTML String for extracting a value

问题

    final String BEFORE_INSTANCE_ID = "/app/Telerik.ReportViewer.axd";
    final String AFTER_INSTANCE_ID = "Percent";
    
    Pattern pattern = Pattern.compile("(" + BEFORE_INSTANCE_ID + ")(.*?)(" + AFTER_INSTANCE_ID + ")");
    Matcher matcher = pattern.matcher(body);

    String instanceId = null;
        
    while (matcher.find()) {
        String temp = matcher.group(0);
        instanceId = StringUtils.substringBetween(temp, BEFORE_INSTANCE_ID, AFTER_INSTANCE_ID).replaceAll("[,;'\\s]", "").trim();
    }

如果您希望从前面提供的代码中提取值 e0f6bb5061864d63b59a18d8187eed21,可以使用以下代码进行提取。不过请注意,使用正则表达式来解析 HTML 或者其他标记语言并不是最佳的方法,因为它可能会很脆弱并且容易出错。更好的方法是使用专门的 HTML 解析库来处理这些情况。

英文:

I have <script> tag inside the HTML document provide below:

&lt;script type=&quot;text/javascript&quot;&gt;
  var ReportViewer1 = new ReportViewer(&#39;ReportViewer1&#39;, &#39;ReportViewer1_ReportToolbar&#39;, &#39;ReportViewer1_ReportArea_WaitControl&#39;, &#39;ReportViewer1_ReportArea_ReportCell&#39;, &#39;ReportViewer1_ReportArea_PreviewFrame&#39;, &#39;ReportViewer1_ParametersAreaCell&#39;, &#39;ReportViewer1_ReportArea_ErrorControl&#39;, &#39;ReportViewer1_ReportArea_ErrorLabel&#39;, &#39;ReportViewer1_CP&#39;, &#39;/app/Telerik.ReportViewer.axd&#39;, &#39;e0f6bb5061864d63b59a18d8187eed21&#39;, &#39;Percent&#39;, &#39;100&#39;, &#39;&#39;, &#39;ReportViewer1_EditorPlaceholder&#39;, &#39;ReportViewer1_CalendarFrame&#39;, &#39;ReportViewer1_ReportArea_DocumentMapCell&#39;,
  {
	CurrentPageToolTip: &#39;STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP&#39;,
	ExportButtonText: &#39;Export&#39;,
	ExportToolTip: &#39;Export&#39;,
	ExportSelectFormatText: &#39;Export to the selected format&#39;,
	FirstPageToolTip: &#39;First page&#39;,
	LabelOf: &#39;of&#39;,
	LastPageToolTip: &#39;Last Page&#39;,
	ProcessingReportMessage: &#39;Generating report...&#39;,
	NoPageToDisplay: &#39;No page to display.&#39;,
	NextPageToolTip: &#39;Next page&#39;,
	ParametersToolTip: &#39;Click to close parameters area|Click to open parameters area&#39;,
	DocumentMapToolTip: &#39;Hide document map|Show document map&#39;,
	PreviousPageToolTip: &#39;Previous page&#39;,
	TogglePageLayoutToolTip: &#39;Switch to interactive view|Switch to print preview&#39;,
	SessionHasExpiredError: &#39;Session has expired.&#39;,
	SessionHasExpiredMessage: &#39;Please, refresh the page.&#39;,
	PrintToolTip: &#39;Print&#39;,
	RefreshToolTip: &#39;Refresh&#39;,
	NavigateBackToolTip: &#39;Navigate back&#39;,
	NavigateForwardToolTip: &#39;Navigate forward&#39;,
	ReportParametersSelectAllText: &#39;&lt;select all&gt;&#39;,
	ReportParametersSelectAValueText: &#39;&lt;select a value&gt;&#39;,
	ReportParametersInvalidValueText: &#39;Invalid value.&#39;,
	ReportParametersNoValueText: &#39;Value required.&#39;,
	ReportParametersNullText: &#39;NULL&#39;,
	ReportParametersPreviewButtonText: &#39;Preview&#39;,
	ReportParametersFalseValueLabel: &#39;False&#39;,
	ReportParametersInputDataError: &#39;Missing or invalid parameter value. Please input valid data for all parameters.&#39;,
	ReportParametersTrueValueLabel: &#39;True&#39;,
	MissingReportSource: &#39;The source of the report definition has not been specified.&#39;,
	ZoomToPageWidth: &#39;Page Width&#39;,
	ZoomToWholePage: &#39;Full Page&#39;
}, &#39;ReportViewer1_ReportArea_ReportArea&#39;, &#39;ReportViewer1_ReportArea_SplitterCell&#39;, &#39;ReportViewer1_ReportArea_DocumentMapCell&#39;, true, true, &#39;PDF&#39;, &#39;ReportViewer1_RSID&#39;, true);
    &lt;/script&gt;

I would like to extract the value e0f6bb5061864d63b59a18d8187eed21 from the body provided earlier. I wrote the code using regex for the purpose:

final String BEFORE_INSTANCE_ID = &quot;/app/Telerik.ReportViewer.axd&quot;;
final String AFTER_INSTANCE_ID = &quot;Percent&quot;;

Pattern pattern = Pattern.compile(&quot;(&quot; + BEFORE_INSTANCE_ID + &quot;)(.*?)(&quot; + AFTER_INSTANCE_ID + &quot;)&quot;);
        Matcher matcher = pattern.matcher(body);


    String instanceId = null;
    
    while (matcher.find()) {
        
        String temp = matcher.group(0);
        instanceId = StringUtils.substringBetween(temp, BEFORE_INSTANCE_ID, AFTER_INSTANCE_ID).replaceAll(&quot;[,;&#39;\\s]&quot;, &quot;&quot;).trim();
    }

Is there a better and nicer way to code this?

答案1

得分: 2

假设str是给定的字符串,因此要提取值,简单的正则表达式应该可行。

Pattern pattern = Pattern.compile(",\\s*'([0-9a-f]{32})'\\s*,", Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
String result = null;
if(matcher.find()) {
    result = matcher.group(1);
}
英文:

Assume str is given string, so to extract the value simple regexp should work

Pattern pattern = Pattern.compile(&quot;,\\s*&#39;([0-9a-f]{32})&#39;\\s*,&quot;, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
String result = null;
if(matcher.find()) {
    result = matcher.group(1);
}

huangapple
  • 本文由 发表于 2020年3月15日 22:01:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/60693705.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定