问题

I have a line that can look different depending on the input. My current method that is not working, is to loop over it using range() so I can get the current position.
The line consists of the word "LR" then a "left" string and a "right" string separated by a space. The problem is, you cannot split it at all of the spaces because sometimes the left and/or right string has a space in it itself, causing the actual left or right string to be split more than once.

2 example inputs that demonstrate this are:

LR "redirect":"\\" " "

This one you would not have a problem separating using a space.

LR "name=\"uuid\" value=\"\"" "\""

This one fails on regex.

def parseLR(self, line) -> None:
        
    line = line.split("LR ")[1].split(" ->")[0]

    left = ""
    seen = 0
    encountered = False

    for x in range(len(line)):

        char = line[x]

        if encountered and seen % 2 == 0:
            break

        if char == '"' and line[x - 1] != '\\':
            seen += 1

        elif char == " ":
            encountered = True
            
        left += char
        
    print(left)

This is my current approach. I go character by character, on each character check, I check if it is a "; if so, I increment the seen counter. If it is not, I check if the char is a space, if it is, I set encountered to True. Then regardless of that, I check if seen is even, meaning there is an equal number of " in the string, and if there has been a space encountered. If so, that is the end of the LEFT string. If you run it, you will see the problem that occurs. How can I properly parse the left string and right string from the lines?

英文:

I have a line that can look different depending on the input. My current method that is not working, is to loop over it using range() so I can get the current position.
The line consists of the word "LR" then a "left" string and a "right" string separated by a space. The problem is, you can not split it at all of the spaces because sometimes the left and/or right string has a space in it itself, causing the actual left or right string to be split more than once.
2 example inputs that demonstrate this are:

LR &quot;redirect\&quot;:\&quot;\\&quot; &quot;\&quot;&quot; -&gt;

This one you would not have a problem separating using a space.

LR &quot;name=\&quot;uuid\&quot; value=\&quot;&quot; &quot;\&quot;&quot;

This one fails on regex.

LR &quot;&lt;span class=\&quot;pointsNormal\&quot;&gt;&quot; &quot;&lt;&quot; -&gt;

This one as you can see has a space in the left side of the string after 'span'.

    def parseLR(self, line) -&gt; None:
        
        line = line.split(&quot;LR &quot;)[1].split(&quot; -&gt;&quot;)[0]

        left = &quot;&quot;
        seen = 0
        encountered = False

        for x in range(len(line)):

            char = line[x]

            if encountered and seen % 2 == 0:
                break

            if char == &#39;&quot;&#39; and line[x - 1] != &#39;\\&#39;:
                seen += 1

            elif char == &quot; &quot;:
                encountered = True
            
            left += char
        
        print(left)

This is my current approach. I go character by character, on each character check, I check if it is a ", if so I increment the seen counter, if it is not, I check if the char is a space, if it is, I set encountered to True. Then regardless of that, I check if seen is even meaning there is an equal number of " in the string, and if there has been a space encountered. If so that is the end of the LEFT string. If you run it, you will see the problem that occurs. How can I properly parse the left string and right string from the lines?

答案1

得分: 1

以下是翻译好的部分：

file = r''''
LR "name="uuid" value="" "" ->
LR "redirect":"\" "" ->
LR "[{'userLevel': '" "" ->
LR "<span class=\"pointsNormal\">" "<" ->
''''

def splitdata(line: str) -> tuple:
    for i, c in enumerate(line):
        #create a cache of the last 3 characters, pad if necessary
        cache = line[max(0, i-3):i].rjust(3, " ")
        
        #if this character is not a space preceded by a double quote, skip
        if not (c == ' ' and cache[-1] == '"'): continue
        
        #if the quote is not escaped LR has been found
        if cache[-2] != "\\" or cache == '\\\\\\"': break
    
    #return LR
    return line[:i], line[i+1:]

     
for line in file.split('\n'):
    if line:
        #it's better to do this here 
        #so splitdata doesn't become specific to your file
        line = line.split('LR ')[1].split(' ->')[0]
        left, right = splitdata(line)

请注意，由于代码中存在HTML和转义字符，因此在翻译时保留了原始字符。

英文:

The below should split any string on a space that is preceded by a double quote that is not escaped. Commented for clarity.

file = r&#39;&#39;&#39;
LR &quot;name=\&quot;uuid\&quot; value=\&quot;&quot; &quot;\&quot;&quot; -&gt;
LR &quot;redirect\&quot;:\&quot;\\&quot; &quot;\&quot;&quot; -&gt;
LR &quot;[{&#39;userLevel&#39;: &#39;&quot; &quot;&#39;&quot; -&gt;
LR &quot;&lt;span class=\&quot;pointsNormal\&quot;&gt;&quot; &quot;&lt;&quot; -&gt;
&#39;&#39;&#39;

def splitdata(line:str) -&gt; tuple:
    for i, c in enumerate(line):
        #create a cache of the last 3 characters, pad if necessary
        cache = line[max(0, i-3):i].rjust(3, &quot; &quot;)
        
        #if this character is not a space preceded by a double quote, skip
        if not (c==&#39; &#39; and cache[-1]==&#39;&quot;&#39;): continue
        
        #if the quote is not escaped LR has been found
        if cache[-2] != &quot;\\&quot; or cache==&#39;\\\\&quot;&#39;: break
    
    #return LR
    return line[:i], line[i+1:]

     
for line in file.split(&#39;\n&#39;):
    if line:
        #it&#39;s better to do this here 
        #so splitdata doesn&#39;t become specific to your file
        line = line.split(&#39;LR &#39;)[1].split(&#39; -&gt;&#39;)[0]
        left, right = splitdata(line)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

尝试使用Python逐个字符解析一行。

问题

答案1

如何在CS50P pset 5上有效地运行单元测试（测试 fuel.py）？

将datetime.date对象转换为字符串。

Python无法运行，我不知道问题是什么。

Striding in numpy and pytorch, How force writing to an array or "input tensor and the written-to tensor refer to a single memory location"?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论