尝试使用Python逐个字符解析一行。

huangapple go评论77阅读模式
英文:

Attempting to parse a line, character by character using python

问题

I have a line that can look different depending on the input. My current method that is not working, is to loop over it using range() so I can get the current position.
The line consists of the word "LR" then a "left" string and a "right" string separated by a space. The problem is, you cannot split it at all of the spaces because sometimes the left and/or right string has a space in it itself, causing the actual left or right string to be split more than once.

2 example inputs that demonstrate this are:

LR "redirect":"\\" " "

This one you would not have a problem separating using a space.

LR "name=\"uuid\" value=\"\"" "\""

This one fails on regex.

def parseLR(self, line) -> None:
        
    line = line.split("LR ")[1].split(" ->")[0]

    left = ""
    seen = 0
    encountered = False

    for x in range(len(line)):

        char = line[x]

        if encountered and seen % 2 == 0:
            break

        if char == '"' and line[x - 1] != '\\':
            seen += 1

        elif char == " ":
            encountered = True
            
        left += char
        
    print(left)

This is my current approach. I go character by character, on each character check, I check if it is a "; if so, I increment the seen counter. If it is not, I check if the char is a space, if it is, I set encountered to True. Then regardless of that, I check if seen is even, meaning there is an equal number of " in the string, and if there has been a space encountered. If so, that is the end of the LEFT string. If you run it, you will see the problem that occurs. How can I properly parse the left string and right string from the lines?

英文:

I have a line that can look different depending on the input. My current method that is not working, is to loop over it using range() so I can get the current position.
The line consists of the word "LR" then a "left" string and a "right" string separated by a space. The problem is, you can not split it at all of the spaces because sometimes the left and/or right string has a space in it itself, causing the actual left or right string to be split more than once.
2 example inputs that demonstrate this are:

LR "redirect\":\"\\" "\"" ->

This one you would not have a problem separating using a space.

LR "name=\"uuid\" value=\"" "\""

This one fails on regex.

LR "<span class=\"pointsNormal\">" "<" ->

This one as you can see has a space in the left side of the string after 'span'.

    def parseLR(self, line) -> None:
        
        line = line.split("LR ")[1].split(" ->")[0]

        left = ""
        seen = 0
        encountered = False

        for x in range(len(line)):

            char = line[x]

            if encountered and seen % 2 == 0:
                break

            if char == '"' and line[x - 1] != '\\':
                seen += 1

            elif char == " ":
                encountered = True
            
            left += char
        
        print(left)

This is my current approach. I go character by character, on each character check, I check if it is a ", if so I increment the seen counter, if it is not, I check if the char is a space, if it is, I set encountered to True. Then regardless of that, I check if seen is even meaning there is an equal number of " in the string, and if there has been a space encountered. If so that is the end of the LEFT string. If you run it, you will see the problem that occurs. How can I properly parse the left string and right string from the lines?

答案1

得分: 1

以下是翻译好的部分:

file = r''''
LR "name="uuid" value="" "" ->
LR "redirect":"\" "" ->
LR "[{'userLevel': '" "" ->
LR "<span class=\"pointsNormal\">" "<" ->
''''

def splitdata(line: str) -> tuple:
    for i, c in enumerate(line):
        #create a cache of the last 3 characters, pad if necessary
        cache = line[max(0, i-3):i].rjust(3, " ")
        
        #if this character is not a space preceded by a double quote, skip
        if not (c == ' ' and cache[-1] == '"'): continue
        
        #if the quote is not escaped LR has been found
        if cache[-2] != "\\" or cache == '\\\\\\"': break
    
    #return LR
    return line[:i], line[i+1:]

     
for line in file.split('\n'):
    if line:
        #it's better to do this here 
        #so splitdata doesn't become specific to your file
        line = line.split('LR ')[1].split(' ->')[0]
        left, right = splitdata(line)

请注意,由于代码中存在HTML和转义字符,因此在翻译时保留了原始字符。

英文:

The below should split any string on a space that is preceded by a double quote that is not escaped. Commented for clarity.

file = r&#39;&#39;&#39;
LR &quot;name=\&quot;uuid\&quot; value=\&quot;&quot; &quot;\&quot;&quot; -&gt;
LR &quot;redirect\&quot;:\&quot;\\&quot; &quot;\&quot;&quot; -&gt;
LR &quot;[{&#39;userLevel&#39;: &#39;&quot; &quot;&#39;&quot; -&gt;
LR &quot;&lt;span class=\&quot;pointsNormal\&quot;&gt;&quot; &quot;&lt;&quot; -&gt;
&#39;&#39;&#39;

def splitdata(line:str) -&gt; tuple:
    for i, c in enumerate(line):
        #create a cache of the last 3 characters, pad if necessary
        cache = line[max(0, i-3):i].rjust(3, &quot; &quot;)
        
        #if this character is not a space preceded by a double quote, skip
        if not (c==&#39; &#39; and cache[-1]==&#39;&quot;&#39;): continue
        
        #if the quote is not escaped LR has been found
        if cache[-2] != &quot;\\&quot; or cache==&#39;\\\\&quot;&#39;: break
    
    #return LR
    return line[:i], line[i+1:]

     
for line in file.split(&#39;\n&#39;):
    if line:
        #it&#39;s better to do this here 
        #so splitdata doesn&#39;t become specific to your file
        line = line.split(&#39;LR &#39;)[1].split(&#39; -&gt;&#39;)[0]
        left, right = splitdata(line)

huangapple
  • 本文由 发表于 2023年7月3日 10:19:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76601513.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定