XSL 1.0,如何在不切割单词的情况下拆分字符串

huangapple go评论72阅读模式
英文:

XSL 1.0, How to split string with taking care about not slicing words

问题

我必须改进在XSL中拆分长字符串的方法。行大小为60个字符。当出现相当长的字符串时,它会以一种不太优雅的方式拆分成多行。我尝试实现处理空格的机制,以避免在单词中间切割。

现在,代码看起来像这样:

<xsl:template name="split_text">    
    <xsl:param name="sText"/>
    <xsl:param name="lineSize">60</xsl:param>

    <xsl:variable name="toDisplay" saxon:assignable="yes"/>
    <xsl:variable name="toProcess" saxon:assignable="yes" select="$sText"/>

    <saxon:while test="string-length($toProcess) &gt; $lineSize">
        <saxon:assign name="toDisplay" select="substring($toProcess, 1, $lineSize)"/>
        <saxon:assign name="toProcess" select="substring($toProcess, $lineSize + 1)"/>
        <xsl:value-of select="$toDisplay"/><br/>
    </saxon:while>
    <xsl:value-of select="$toProcess"/>

</xsl:template>

它只会在文本长度超过行容量时进行拆分。
我希望处理行容量在某些单词中间结束的情况。我了解了标记器(tokenizers)、substring-before-last等内容。但我在Java中遇到了一些异常。可能我正在使用过旧的XSL版本,但不是不可能将其升级,所以我必须使用现有的内容。

我担心仅仅依赖每行中最后一个空格字符的出现,因为输入可以是一个长的字符序列而没有任何空格,此时最佳选项仍然是使用我上面粘贴的代码。
在XSL中,是否有一种简单的方式可以进行标记化?

我应该对完整字符串进行标记化,并在它们的总长度小于行容量时附加每个下一个标记吗?
或者,我应该检查每行的最后一个字符是否为空格字符,然后进行一些附加操作?

我感到很困惑,这是我和XSL的第一次约会。

附加编辑:
我找到了对我来说有趣的函数saxon:tokenize。文档中的描述听起来很不错 - 这就是我所需要的。但在XSL 1.0和Saxon中是否可以使用 - 这是来自清单的粘贴内容:

Manifest-Version: 1.0
Main-Class: com.icl.saxon.StyleSheet
Created-By: 1.3.1_16 (Sun Microsystems Inc.)

如果可以的话,如何在其上进行迭代?我在网上找到了一些不同的迭代风格,我不知道它们之间的区别、优缺点是什么。

英文:

I have to improve splitting long strings in XSL. The line size is 60 characters. When there appears quite a long string, it is splitting into lines in so inelegant way.
I try to implement the mechanism of taking care of spaces, to avoid slicing words in the middle of them.

Now, the code looks like that:

&lt;xsl:template name=&quot;split_text&quot;&gt;    
       &lt;xsl:param name=&quot;sText&quot;/&gt;
       &lt;xsl:param name=&quot;lineSize&quot;&gt;60&lt;/xsl:param&gt;
    
       &lt;xsl:variable name=&quot;toDisplay&quot; saxon:assignable=&quot;yes&quot;/&gt;
       &lt;xsl:variable name=&quot;toProcess&quot; saxon:assignable=&quot;yes&quot; select=&quot;$sText&quot;/&gt;

       &lt;saxon:while test=&quot;string-length($toProcess) &gt; $lineSize&quot;&gt;
          &lt;saxon:assign name=&quot;toDisplay&quot; select=&quot;substring($toProcess, 1, $lineSize)&quot;/&gt;
          &lt;saxon:assign name=&quot;toProcess&quot; select=&quot;substring($toProcess, $lineSize + 1)&quot;/&gt;
          &lt;xsl:value-of select=&quot;$toDisplay&quot;/&gt;&lt;br/&gt;
       &lt;/saxon:while&gt;
       &lt;xsl:value-of select=&quot;$toProcess&quot;/&gt;

    &lt;/xsl:template&gt;

It's just split text if it is longer than line capacity.
I want to handle cases when line capacity ends in the middle of some words. I read about tokenizers, substring-before-last, etc. But I got some exceptions in java. Probably I am working on too old XSL version, but it is not impossible to upgrade it, so I have to use what I have.

I am afraid of depending on the last occurrence of space char in every line because the input can be a long char sequence without any spaces, and then the best option will be still using code which I pasted upside.
Is it in XSL some simple way, to tokenize?

Should I tokenize full string and append every next token as long as their summary length is smaller than line capacity?
Or maybe should I check if the last character in line is space char, or not, and then make some additional operations?

I am so confused, it is my first date with XSL.

ADDITIONAL EDIT:
I found interesting for me function saxon:tokenize. Description in documentation sounds great - this is what I need. But it is possible to use in XSL 1.0 and Saxon - here paste from Manifest:

Manifest-Version: 1.0
Main-Class: com.icl.saxon.StyleSheet
Created-By: 1.3.1_16 (Sun Microsystems Inc.)
```.

If yes, how to iterate over that? I found on the web some various styles of iterating and I don&#39;t know and don&#39;t understand what differences, pros, and cons are between they

</details>


# 答案1
**得分**: 0

好的,我已经完成了,所以我将分享我的解决方案,也许有人会遇到类似的问题。

```xml
&lt;xsl:template name=&quot;split_text&quot;&gt;    
    &lt;xsl:param name=&quot;sText&quot;/&gt;
    &lt;xsl:param name=&quot;lineSize&quot;&gt;60&lt;/xsl:param&gt;
    
    &lt;xsl:variable name=&quot;remainder&quot; saxon:assignable=&quot;yes&quot;/&gt;
    &lt;xsl:variable name=&quot;textTokens&quot; saxon:assignable=&quot;yes&quot; select=&quot;saxon:tokenize($sText)&quot; /&gt;

    &lt;xsl:choose&gt;
        &lt;!-- 如果行长度已满,则打印该行并清除剩余部分 --&gt;
        &lt;xsl:when test=&quot;(string-length($remainder) &gt;= $lineSize)&quot;&gt;
            &lt;xsl:value-of select=&quot;$remainder&quot;/&gt;&lt;br/&gt;
            &lt;saxon:assign name=&quot;remainder&quot; select=&quot;&#39;&#39;&quot;/&gt;				
        &lt;/xsl:when&gt;
        &lt;!-- 逐个单词添加到行中,直到行填满 --&gt;
        &lt;xsl:otherwise&gt;
            &lt;saxon:assign name=&quot;remainder&quot; select=&quot;concat($remainder, &#39; &#39;, $currentToken, &#39; &#39;)&quot;/&gt;
        &lt;/xsl:otherwise&gt;
    &lt;/xsl:choose&gt;			
&lt;/xsl:for-each&gt;
&lt;/xsl:template&gt;

我使用了Saxon的tokenize函数,并开始迭代标记列表,在每次循环后检查行长度。

英文:

Okay, I have done it, so I will share my solution, maybe somebody will have similar problem.

&lt;xsl:template name=&quot;split_text&quot;&gt;    
       &lt;xsl:param name=&quot;sText&quot;/&gt;
       &lt;xsl:param name=&quot;lineSize&quot;&gt;60&lt;/xsl:param&gt;
    
       &lt;xsl:variable name=&quot;remainder&quot; saxon:assignable=&quot;yes&quot;/&gt;
	   &lt;xsl:variable name=&quot;textTokens&quot; saxon:assignable=&quot;yes&quot; select=&quot;saxon:tokenize($sText)&quot; /&gt;

            &lt;xsl:choose&gt;
			&lt;!-- If line length is fill, then it is printed and remainder is cleared --&gt;
				&lt;xsl:when test=&quot;(string-length($remainder) &gt;= $lineSize)&quot;&gt;
					&lt;xsl:value-of select=&quot;$remainder&quot;/&gt;&lt;br/&gt;
					&lt;saxon:assign name=&quot;remainder&quot; select=&quot;&#39;&#39;&quot;/&gt;				
				&lt;/xsl:when&gt;
				&lt;!-- Words are sequentially adding to line until it become filled --&gt;
				&lt;xsl:otherwise&gt;
					&lt;saxon:assign name=&quot;remainder&quot; select=&quot;concat($remainder, &#39; &#39;, $currentToken, &#39; &#39;)&quot;/&gt;
				&lt;/xsl:otherwise&gt;
			&lt;/xsl:choose&gt;			
		&lt;/xsl:for-each&gt;
    &lt;/xsl:template&gt;

I used saxon's tokenize, and start to iterate over list of tokens, checking line length after every loop.

huangapple
  • 本文由 发表于 2020年9月19日 04:16:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/63962331.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定