英文:
Using python I would like to know how to delete the first occurrence of a character between two strings in a file
问题
以下是您要翻译的内容:
例如,如果我有一些文件:
OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE
我想要删除每个'OPEN('后面的第一个逗号',',但只有在下一个') CLOSE'之前出现时才这样做,以便结果文件看起来像这样:
OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
有关如何处理这个问题的想法吗?我尝试使用正则表达式,但不知道如何指定条件。是否可以结合使用awk和sed来解决?
尝试的正则表达式解决方案:
pattern = r'WITH \(([^)]+),([^)]+)\) AS'
replacement = r'WITH () AS'
sql_content_modified = re.sub(pattern, replacement, sql_content)
最终解决方案类似于:
# 读取SQL文件
with open(f'{filename}', 'r') as file:
content = file.read()
content_modified = content.replace('(\n,', '(')
content_modified = re.sub('--<([a-z]*.*[A-Z]*)>', '', content) # 删除--<*>行
# 将修改后的内容写回文件
with open(f'{filename}', 'w') as file:
file.write(content_modified)
remove_empty_lines_from_file(filename)
# 现在做同样的事情,但将"(,\n"替换为"(\n"...
with open(f'{filename}', 'r') as file:
content = file.read()
content_modified = content.replace('(\n,', '(\n')
with open(f'{filename}', 'w') as file:
file.write(content_modified)
希望这些翻译能够帮助您理解您的代码和问题。
英文:
For example if I had some file:
OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE
I would like to remove the first occurrence of the character ',' following each 'OPEN(' but only if it occurs before the next ') CLOSE' such that the resulting file would look like:
OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
Any thoughts on how I should approach? I have tried using regex, but I don't know how to specify conditions. Could some combination of awk & sed be used?
attempted regex solution:
pattern = r'WITH \(([^)]+),([^)]+)\) AS'
replacement = r'WITH () AS'
sql_content_modified = re.sub(pattern, replacement, sql_content)
ended up solving with something like:
# Read the SQL file
with open(f'{filename}', 'r') as file:
content = file.read()
content_modified = content.replace('(\n,', '(')
content_modified = re.sub('--<([a-z]*.*[A-Z]*)>', '', content) # removes the --<*> lines
# Write the modified content back to the file
with open(f'{filename}', 'w') as file:
file.write(content_modified)
remove_empty_lines_from_file(filename)
# now do the same but replace "WITH (\n," with "WITH (\n" ...
with open(f'{filename}', 'r') as file:
content = file.read()
content_modified = content.replace('(\n,', '(\n')
with open(f'{filename}', 'w') as file:
file.write(content_modified)
答案1
得分: 4
在GNU awk
中,只针对您提供的示例,请尝试以下awk
代码。我将RS
设置为(^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE
,以满足要求,然后使用RT来获取所需的输出。
以下是用于理解的在线演示,请注意,在awk
中,我们需要双重转义,但在正则表达式站点中,单个转义即可,所以正则表达式在这里看起来有点不同。
awk -v RS='(^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE' '
RT{
sub(/^\n/,"",RT)
sub(/,/,"",RT)
print RT
}
' Input_file
英文:
In GNU awk
with your shown samples only, please try following awk
code. Where I am setting RS
to (^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE
as per requirement and then playing around with RT to get the required output.
Here is the Online Demo for used regex for understanding purposes, note awk
we need to double escape but in regex site single escape works, so that's why regex looks little different here.
awk -v RS='(^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE' '
RT{
sub(/^\n/,"",RT)
sub(/,/,"",RT)
print RT
}
' Input_file
答案2
得分: 2
Not very efficient but it those work. (assuming a file named "test.txt" in the same directory as the .py file)
with open("test.txt") as f:
lines = f.readlines()
newLines = []
flag = False
for index, line in enumerate(lines):
if(flag):
if(lines[index+1] != ') CLOSE\n'):
line = line.strip(',')
flag = False
if(line == "OPEN(\n"):
flag = True
newLines.append(line)
print(newLines)
with open("test.txt", 'w') as f:
for line in newLines:
f.write(line)
英文:
Not very efficient but it those work. (assuming a file named "test.txt" in the same directory as the .py file)
with open("test.txt") as f:
lines = f.readlines()
newLines = []
flag = False
for index, line in enumerate(lines):
if(flag):
if(lines[index+1] != ') CLOSE\n'):
line = line.strip(',')
flag = False
if(line == "OPEN(\n"):
flag = True
newLines.append(line)
print(newLines)
with open("test.txt", 'w') as f:
for line in newLines:
f.write(line)
答案3
得分: 2
这可能对您有用(GNU sed):
sed '/OPEN/{:a;N;/CLOSE/!ba;s/,//}' file
匹配字符串 OPEN
,然后收集更多行直到找到匹配的 CLOSE
。然后将第一个,
替换为空白。
注意:如果没有关闭字符串,将在遇到文件末尾的读取时读取剩余部分,然后打印,即N
命令在 $
地址之后执行。
英文:
This might work for you (GNU sed):
sed '/OPEN/{:a;N;/CLOSE/!ba;s/,//}' file
Match the string OPEN
then gather up further lines until one matching CLOSE
. Then substitute the first ,
with nothing.
N.B. If there is no closing string, the remainder of the file will be read then printed once a read on the end-of-file is encounter i.e. the N
command is enacted after the $
address.
答案4
得分: 1
下面是你要的翻译:
# 简单的Python脚本,带有解释性注释。
# 'delete_key.txt' - 输入数据文件
# 'delete_key_formatted.txt' - 带有格式化数据的输出文件
def format_data():
char_remove = ","
all_chars = [] # 最终列表
with open("delete_key.txt") as f:
cleaned = f.readlines()
is_open = False
chars_in_between = [] # 用于存储每个迭代(从打开到关闭或从打开到打开的情况)
for clean in cleaned:
if "OPEN" in clean:
if is_open: # 如果我们在没有关闭的情况下到达开启
all_chars.extend(chars_in_between)
chars_in_between = []
is_open = True
'''
(可选)我们可以添加(如果is_open:)条件来跳过所有没有开启的行
'''
chars_in_between.append(clean) # 收集所有行
if "CLOSE" in clean and is_open: # 如果我们在开启后到达关闭
is_open = False
chars_in_between[1] = chars_in_between[1][1:] if chars_in_between[1][0] == char_remove else chars_in_between[1] # 删除每个'OPEN('后的字符','但仅在下一个') CLOSE'之前出现之前
all_chars.extend(chars_in_between) # 追加到当前迭代的最终列表
chars_in_between = [] # 重置临时当前迭代
if len(chars_in_between) > 0: # 如果我们在没有关闭的情况下到达文件结尾
all_chars.extend(chars_in_between)
with open("delete_key_formatted.txt", "w") as f:
f.writelines(all_chars)
if __name__ == "__main__":
format_data()
英文:
Simple python script with comments for explanation.
delete_key.txt - input data file
delete_key_formatted.txt - output file with the formatted data
def format_data():
char_remove = ","
all_chars = [] # final list
with open("delete_key.txt") as f:
cleaned = f.readlines()
is_open = False
chars_in_between = [] # temp to store each iterate (open to close or open to open)
for clean in cleaned:
if "OPEN" in clean:
if is_open: # if we have reached open without close
all_chars.extend(chars_in_between)
chars_in_between = []
is_open = True
'''
(Optional)We can add (if is_open:) condition to skip all the lines that come without open
'''
chars_in_between.append(clean) # collecting all lines
if "CLOSE" in clean and is_open: # if we reached close after hitting open
is_open = False
chars_in_between[1] = chars_in_between[1][1:] if chars_in_between[1][0] == char_remove else chars_in_between[1] # remove the first occurrence of the character ',' following each 'OPEN(' but only if it occurs before the next ') CLOSE'
all_chars.extend(chars_in_between) # appending to the current iterate to final list
chars_in_between = [] # reset the temp current iterate
if len(chars_in_between) > 0: # if we reached end of file without close
all_chars.extend(chars_in_between)
with open("delete_key_formatted.txt", "w") as f:
f.writelines(all_chars)
if __name__ == "__main__":
format_data()
答案5
得分: 1
你可以使用正则表达式和零长度断言来实现,假设 file.txt
的内容如下:
OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE
然后使用以下代码:
import re
with open('file.txt', 'r') as f:
text = f.read()
with open('file.txt', 'w') as f:
f.write(re.sub(r'(?<=\(\n),(?=.*CLOSE)', '', text, flags=re.DOTALL))
这将把 file.txt
的内容更改为:
OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
解释:首先将整个文件的内容读入变量 text
,然后使用正则表达式替换所有在括号后换行之后(这是正向后查找)且在零个或多个任意字符后跟随 CLOSE
(这是正向前查找)的逗号,并包括换行符在内的任何字符(re.DOTALL
),然后将结果写回同名文件。
如果需要更多关于零长度断言的信息,请参考 re
文档。
(在 Python 3.10.6 中测试通过)
英文:
You might use regular expression with zero-length assertions following way, let file.txt
content be
OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE
then
import re
with open('file.txt','r') as f:
text = f.read()
with open('file.txt','w') as f:
f.write(re.sub(r'(?<=\(\n),(?=.*CLOSE)', '', text, flags=re.DOTALL))
do change file.txt
content to
OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
Explanation: read content of whole file into text
, then replace all ,
-s which are after bracket-newline (this is positive lookbehind) and before zero-or-more any character followed by CLOSE
(this is positive lookahead) with any character including newline (re.DOTALL
), then write result to file with same name.
Consult re
docs if you want to know more about zero-length assertions available.
(tested in Python 3.10.6)
答案6
得分: 0
以下是翻译好的部分:
"...但仅在下一个')CLOSE'之前发生时..."
尽管如此,您可以使用以下内容来匹配您预期的结果。
(?s)(?<=OPEN\(.),
输出
OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
英文:
To preface, I am misunderstanding the following condition.
> "... but only if it occurs before the next ') CLOSE' ..."
Nevertheless, you can use the following to match your expected result.
(?s)(?<=OPEN\(.),
Output
OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论