如何从JavaCC令牌的图像中去除双引号?

huangapple go评论76阅读模式
英文:

How do you remove double quotes from the image of a JavaCC token?

问题

在Java中,我正在接受满足以下条件的字符串:

<STRING : "\"\"" ("\\\"" ~[] | ~["\"" "\\\""] )* "\"\"" >

因此,该图像最终会打印任何带有另一组双引号的字符串。

例如,我会输入:"This is a sentence."

然后值将变为:""This is a sentence."" 存储在一个字符串变量中。

在Java中是否有一种方法可以删除多余的双引号,以便仅打印:""This is a sentence.""?

英文:

In JavaCC, I'm accepting strings that are under the condition:

   &lt; STRING : &quot;\&quot;&quot; (&quot;\\&quot; ~[] | ~[&quot;\&quot;&quot;,&quot;\\&quot;] )* &quot;\&quot;&quot; &gt; 

So the image ends up printing anything that is a string but with another set of double quotes.

For example, I'll input: "This is a sentence."

And the value will result : ""This is a sentence."" stored in a String variable.

Is there a way in Java to remove the extra set of double quotes so that it only prints: "This is a sentence."?

答案1

得分: 2

如果与您的标记匹配的输入是&quot;Hello&quot;,那么标记的image字段的值将是一个7个字符长的字符串,其第一个和最后一个字符是双引号字符。它们实际上不是额外的,它们在输入中。假设您编写了以下代码:

void foo() : {
    Token t ; }
{
    t = &lt;STRING&gt;
    { System.out.println( t.image ) ; }
}

这将打印出7个字符,然后是一个换行符。

现在,如果您不想要这些字符,好的,@Bryan的答案将可以做到。

void foo() : {
    Token t ; }
{
    t = &lt;STRING&gt;
    { { String nakedImage = t.image.substring(1,t.image.length()-1) ;
        System.out.println( nakedImage ) ; } }
}

应该注意的是,没有引号被移除。在Java中,String对象是不可变的,这意味着它们不能被改变。实际上发生的是创建了一个新的String对象,并且将对它的引用分配给了nakedImage变量。t.image引用的String对象保持不变。

现在您仍然需要处理反斜杠的问题。如果输入是&quot;Hello\tWorld&quot;,那么t.image的长度将是14个字符,而nakedImage的长度将是12个字符。在这一点上,我要做的是通过一个函数运行字符串,构建一个新的字符串,其中nakedImage的转义序列被替换为单个字符。因此,对于这个例子,该函数的结果将是11个字符长。

void foo() : {
    Token t ; }
{
    t = &lt;STRING&gt;
    { { String nakedImage = t.image.substring(1,t.image.length()-1) ;
        String unescapedImage = unescape( nakedImage ) ;
        System.out.println( unescapedImage ) ; } }
}

这是一个这样的函数,基于我为Java编译器编写的函数。

private static String unescape( String str ) {
    StringBuffer result = new StringBuffer() ;
    for( int i=0, len = str.length() ; i&lt;len ; ) {
        char ch = str.charAt(i) ;

        // 设置ch并增加i;
        if( ch == &#39;\\&#39; )  {
            ch = str.charAt(i+1) ;
            switch( ch ) {
                case &#39;b&#39; : ch = &#39;\b&#39; ; i += 2 ; break ;
                case &#39;t&#39; : ch = &#39;\t&#39; ; i += 2 ; break ;
                case &#39;n&#39; : ch = &#39;\n&#39; ; i += 2 ; break ;
                case &#39;f&#39; : ch = &#39;\f&#39; ; i += 2 ; break ;
                case &#39;r&#39; : ch = &#39;\r&#39; ; i += 2 ; break ;
                case &#39;&quot;&#39; : case &#39;\&#39;&#39; : case &#39;\\&#39; : i+= 2 ; break ;
                default: 
                    /*TODO 处理错误。*/ } }
        else {
            i += 1 ; }
        result.append( ch ) ; }
    return result.toString() ;
}
英文:

If the input matched by your token is &quot;Hello&quot; then the value of the image field of the token will be a 7 character string whose first and last characters are double quote characters. They're not really extra they were they in the input. Say you write

void foo() : {
Token t ; }
{
t = &lt;STRING&gt;
{ System.out.println( t.image ) ; }
}

That'll print 7 characters and then a newline.

Now if you don't want those characters, well, @Bryan's answer will do it.

void foo() : {
Token t ; }
{
t = &lt;STRING&gt;
{ { String nakedImage = t.image.substring(1,t.image.length()-1) ;
System.out.println( nakedImage ) ; } }
}

It should be noted that no quotes are removed. String objects in Java are immutable, meaning they can't be changed. What really happens is that a new String object gets created and a reference to it is assign to the nakedImage variable. The String object that t.image is a reference to remains the same.

Now you still have the problem of dealing with the back slashes. If the input is "Hello\tWorld", then t.image will be 14 characters long and nakedImage will be 12 characters long. What I do at this point is to run the string through a function builds a new string that has single characters where the nakedImage has escape sequences. So the result of that function on this example would be 11 characters long.

void foo() : {
Token t ; }
{
t = &lt;STRING&gt;
{ { String nakedImage = t.image.substring(1,t.image.length()-1) ;
String unescapedImage = unescape( nakedImage ) ;
System.out.println( unescapedImage ) ; } }
}

Here's such a function, based on one I wrote for a Java compiler.

private static String unescape( String str ) {
StringBuffer result = new StringBuffer() ;
for( int i=0, len = str.length() ; i&lt;len ; ) {
char ch = str.charAt(i) ;
// Set ch and increment i ;
if( ch == &#39;\\&#39; )  {
ch = str.charAt(i+1) ;
switch( ch ) {
case &#39;b&#39; : ch = &#39;\b&#39; ; i += 2 ; break ;
case &#39;t&#39; : ch = &#39;\t&#39; ; i += 2 ; break ;
case &#39;n&#39; : ch = &#39;\n&#39; ; i += 2 ; break ;
case &#39;f&#39; : ch = &#39;\f&#39; ; i += 2 ; break ;
case &#39;r&#39; : ch = &#39;\r&#39; ; i += 2 ; break ;
case &#39;&quot;&#39; : case &#39;\&#39;&#39; : case &#39;\\&#39; : i+= 2 ; break ;
default: 
/*TODO Deal with errors. */ } }
else {
i += 1 ; }
result.append( ch ) ; }
return result.toString() ;
}

答案2

得分: 0

str = str.substring(1, str.length() - 1);

Javacc的替代方案

https://stackoverflow.com/questions/11878392/parsing-strings-with-javacc

英文:
str = str.substring(1,str.length()-1)

alternate for Javacc

https://stackoverflow.com/questions/11878392/parsing-strings-with-javacc

huangapple
  • 本文由 发表于 2020年10月7日 03:10:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/64232341.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定