英文:
Attempting to parse a line, character by character using python
问题
I have a line that can look different depending on the input. My current method that is not working, is to loop over it using range()
so I can get the current position.
The line consists of the word "LR" then a "left" string and a "right" string separated by a space. The problem is, you cannot split it at all of the spaces because sometimes the left and/or right string has a space in it itself, causing the actual left or right string to be split more than once.
2 example inputs that demonstrate this are:
LR "redirect":"\\" " "
This one you would not have a problem separating using a space.
LR "name=\"uuid\" value=\"\"" "\""
This one fails on regex.
def parseLR(self, line) -> None:
line = line.split("LR ")[1].split(" ->")[0]
left = ""
seen = 0
encountered = False
for x in range(len(line)):
char = line[x]
if encountered and seen % 2 == 0:
break
if char == '"' and line[x - 1] != '\\':
seen += 1
elif char == " ":
encountered = True
left += char
print(left)
This is my current approach. I go character by character, on each character check, I check if it is a "
; if so, I increment the seen counter. If it is not, I check if the char is a space, if it is, I set encountered
to True. Then regardless of that, I check if seen is even, meaning there is an equal number of "
in the string, and if there has been a space encountered. If so, that is the end of the LEFT string. If you run it, you will see the problem that occurs. How can I properly parse the left string and right string from the lines?
英文:
I have a line that can look different depending on the input. My current method that is not working, is to loop over it using range()
so I can get the current position.
The line consists of the word "LR" then a "left" string and a "right" string separated by a space. The problem is, you can not split it at all of the spaces because sometimes the left and/or right string has a space in it itself, causing the actual left or right string to be split more than once.
2 example inputs that demonstrate this are:
LR "redirect\":\"\\" "\"" ->
This one you would not have a problem separating using a space.
LR "name=\"uuid\" value=\"" "\""
This one fails on regex.
LR "<span class=\"pointsNormal\">" "<" ->
This one as you can see has a space in the left side of the string after 'span'.
def parseLR(self, line) -> None:
line = line.split("LR ")[1].split(" ->")[0]
left = ""
seen = 0
encountered = False
for x in range(len(line)):
char = line[x]
if encountered and seen % 2 == 0:
break
if char == '"' and line[x - 1] != '\\':
seen += 1
elif char == " ":
encountered = True
left += char
print(left)
This is my current approach. I go character by character, on each character check, I check if it is a ", if so I increment the seen counter, if it is not, I check if the char is a space, if it is, I set encountered
to True. Then regardless of that, I check if seen is even meaning there is an equal number of " in the string, and if there has been a space encountered. If so that is the end of the LEFT string. If you run it, you will see the problem that occurs. How can I properly parse the left string and right string from the lines?
答案1
得分: 1
以下是翻译好的部分:
file = r''''
LR "name="uuid" value="" "" ->
LR "redirect":"\" "" ->
LR "[{'userLevel': '" "" ->
LR "<span class=\"pointsNormal\">" "<" ->
''''
def splitdata(line: str) -> tuple:
for i, c in enumerate(line):
#create a cache of the last 3 characters, pad if necessary
cache = line[max(0, i-3):i].rjust(3, " ")
#if this character is not a space preceded by a double quote, skip
if not (c == ' ' and cache[-1] == '"'): continue
#if the quote is not escaped LR has been found
if cache[-2] != "\\" or cache == '\\\\\\"': break
#return LR
return line[:i], line[i+1:]
for line in file.split('\n'):
if line:
#it's better to do this here
#so splitdata doesn't become specific to your file
line = line.split('LR ')[1].split(' ->')[0]
left, right = splitdata(line)
请注意,由于代码中存在HTML和转义字符,因此在翻译时保留了原始字符。
英文:
The below should split any string on a space that is preceded by a double quote that is not escaped. Commented for clarity.
file = r'''
LR "name=\"uuid\" value=\"" "\"" ->
LR "redirect\":\"\\" "\"" ->
LR "[{'userLevel': '" "'" ->
LR "<span class=\"pointsNormal\">" "<" ->
'''
def splitdata(line:str) -> tuple:
for i, c in enumerate(line):
#create a cache of the last 3 characters, pad if necessary
cache = line[max(0, i-3):i].rjust(3, " ")
#if this character is not a space preceded by a double quote, skip
if not (c==' ' and cache[-1]=='"'): continue
#if the quote is not escaped LR has been found
if cache[-2] != "\\" or cache=='\\\\"': break
#return LR
return line[:i], line[i+1:]
for line in file.split('\n'):
if line:
#it's better to do this here
#so splitdata doesn't become specific to your file
line = line.split('LR ')[1].split(' ->')[0]
left, right = splitdata(line)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论