Why does the literal string """"""" (seven quotes) give an error?

huangapple go评论53阅读模式
英文:

Why does the literal string """"""" (seven quotes) give an error?

问题

Processing clients input we often use the strip() method. If we wanna remove starting-ending symbols from some specific set we just place all it in the parameter.
The code

".yes' ".strip(". '")

obviously gives 'yes' string as a result.
When I try to remove set ' ". The result depends on this symbols' order. Variant ".yes' ".strip(" . '") works properly, while the variant with symbol ' at the end gives the SyntaxError: unterminated string literal (detected at line 1).

So the question is: "why literal string '''''''''' (using seven quotes) gives an error? It is just the same '''''!''''

英文:

Processing clients input we often use the strip() method. If we wanna remove starting-ending symbols from some specific set we just place all it in the parameter.
The code

".yes' ".strip(". '")

obviously gives 'yes' string as a result.
When I try to remove set ' ". The result depends from this symbols order. Variant ".yes' ".strip(""" ."'""") works properly, when variant with symbol " at the end gives the SyntaxError: unterminated string literal (detected at line 1).

So the question is: "why literal string """"""" (using seven quotes) gives error? It is just the same '"'!"

Updated 1. They say I have wrong mental model of literals. SO lets look documentation:

Triple quoted: '''Three single quotes''', """Three double quotes"""

And Why does the literal string """"""" (seven quotes) give an error?. Click here to verify. So

  1. longstring i.e. """longstringitem"""
  2. longstringitem may be a single char.

So do we have to rewrite documentation or interpreter?

Updated 2. I propose to rewrite the logic of the interpreter. Because my example starts from """ and ends with """ and have one symbol inside. It differs from ''' bc pairs '' have to use the same symbol ' between. I am not using """ inside """ """ pair. See you the difference?

Updated 3. I've register my question on Python documentation issue. Can see here.

Updated 4. I've marked the answer as the answer because I promise to do it )). But text from docs really says:
This is wrong: '''
This is wrong: """
This is wrong: """""""""

And I am agree. But no place in documentation that says:
""""""" is wrong. Thats all that I wanna to say.

No formal rules about it.

Updated 5. Look

In[5] :""""TEXT" """
Out[5]: '"TEXT" '

In[6] :""""TEXT""""
  File "C:\Users\vasil\AppData\Local\Temp\ipykernel_551695884511.py", line 1
    """"TEXT""""
                ^
SyntaxError: EOL while scanning string literal

I.e. 4 in row """" is alowed on start and prohibited at the end. Is it correct situation?

答案1

得分: 5

这反映了根据Python语言规范对字符串的词法分析,1中有文档化的行为:

> 在三引号文字中,允许(并保留)未转义的换行符和引号,除了连续出现的三个未转义的引号会终止文字。 (“引号”是用于打开文字的字符,即 ' 或 "。)

这里关键的一点是:“连续出现的三个未转义的引号会终止文字”。因此,如果您以 """ 开始一个文字,只要遇到另一个 """ 序列,该文字就会结束:解析器不会预先查找以尝试推断文字的不同终点。

因此,当解析器遇到 """""""(七个双引号的连续运行)时:

  1. 第1、2和3个字符告诉解析器它正在处理由三个双引号界定的文字。
  2. 第4、5和6个字符构成了这些“三个未转义的引号”,因此它们终止了文字。
  3. 第7个字符是 ",没有后续的 " 可以与之配对,因此第7个字符构成了一个未终止的文字。解析器失败并显示 SyntaxError: unterminated string literal

值得一提的是,反斜杠转义仍然可以用来防止引号字符被视为引号。例如:

s = """\""""
print(s)

打印:

"
英文:

This reflects the documented behaviour as per the Python language spec around lexical analysis of strings:

> In triple-quoted literals, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the literal. (A “quote” is the character used to open the literal, i.e. either ' or ".)

The crucial point here is that "three unescaped quotes in a row terminate the literal". So if you begin a literal with """, that literal ends as soon as another """ sequence is encountered: the parser doesn't look ahead of that to try to infer a different endpoint for the literal.

When the parser encounters """"""" (a run of seven double-quotes), therefore:

  1. The 1st, 2nd and 3rd characters tell the parser it's dealing with a literal delimited by triple double-quotes.
  2. The 4th, 5th and 6th characters constitute those "three unescaped quotes" so they terminate the literal.
  3. The 7th character is then " with no following " that it can be paired with, so that 7th character constitutes an unterminated literal. The parser fails with SyntaxError: unterminated string literal.

It's worth mentioning that backslash-escaping can still be used to prevent the quote character being treated as a quote. For example:

s = """\""""
print(s)

prints:

"

答案2

得分: 0

这部分内容的翻译如下:

"这种行为是由Python解析器中的一个有趣事情决定的。您可以在此处阅读它。

在第114行附近,您会看到:

"""字符串的尾部。

Double3 = r'[^"\](?:(?:\.|"(?!""))[^"\])*"""'

这部分代码定义了三重引号文字的结尾的正则表达式。我们可以看到,单独的"是允许的,但如果它没有后跟另一个",那么从逻辑上讲是允许的,但这个正则表达式禁止了它。

这是对我的问题的正确答案。但谁在乎呢....

英文:

This behaviour is determined with one interestin thing in python parser. You can read it here.

At line near 114 you see:

# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'

This part of the code define regular expression for the tale of triple quated literal. We can see that separate " is alowed BUT if it does not followed with another ". So logicaly it alowed but this regular expression prohibit it.

This is the correct answer for my question. But who cares....

huangapple
  • 本文由 发表于 2023年4月17日 23:03:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036574.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定