使用Python,我想知道如何删除文件中两个字符串之间第一次出现的字符。

huangapple go评论152阅读模式
英文:

Using python I would like to know how to delete the first occurrence of a character between two strings in a file

问题

以下是您要翻译的内容:

例如,如果我有一些文件:

OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE

我想要删除每个'OPEN('后面的第一个逗号',',但只有在下一个') CLOSE'之前出现时才这样做,以便结果文件看起来像这样:

OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE

有关如何处理这个问题的想法吗?我尝试使用正则表达式,但不知道如何指定条件。是否可以结合使用awk和sed来解决?

尝试的正则表达式解决方案:

pattern = r'WITH \(([^)]+),([^)]+)\) AS'
replacement = r'WITH () AS'
sql_content_modified = re.sub(pattern, replacement, sql_content)

最终解决方案类似于:

# 读取SQL文件
with open(f'{filename}', 'r') as file:
    content = file.read()
content_modified = content.replace('(\n,', '(')
content_modified = re.sub('--<([a-z]*.*[A-Z]*)>', '', content)  # 删除--<*>行
# 将修改后的内容写回文件
with open(f'{filename}', 'w') as file:
    file.write(content_modified)
remove_empty_lines_from_file(filename)

# 现在做同样的事情,但将"(,\n"替换为"(\n"...
with open(f'{filename}', 'r') as file:
    content = file.read()
content_modified = content.replace('(\n,', '(\n')
with open(f'{filename}', 'w') as file:
    file.write(content_modified)

希望这些翻译能够帮助您理解您的代码和问题。

英文:

For example if I had some file:

OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE

I would like to remove the first occurrence of the character ',' following each 'OPEN(' but only if it occurs before the next ') CLOSE' such that the resulting file would look like:

OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE

Any thoughts on how I should approach? I have tried using regex, but I don't know how to specify conditions. Could some combination of awk & sed be used?

attempted regex solution:

pattern = r&#39;WITH \(([^)]+),([^)]+)\) AS&#39;
replacement = r&#39;WITH () AS&#39;
sql_content_modified = re.sub(pattern, replacement, sql_content)

ended up solving with something like:

# Read the SQL file
    with open(f&#39;{filename}&#39;, &#39;r&#39;) as file:
        content = file.read()
    content_modified = content.replace(&#39;(\n,&#39;, &#39;(&#39;)
    content_modified = re.sub(&#39;--&lt;([a-z]*.*[A-Z]*)&gt;&#39;, &#39;&#39;, content)  # removes the --&lt;*&gt; lines
    # Write the modified content back to the file
    with open(f&#39;{filename}&#39;, &#39;w&#39;) as file:
        file.write(content_modified)
    remove_empty_lines_from_file(filename)

    # now do the same but replace &quot;WITH (\n,&quot; with &quot;WITH (\n&quot; ...
    with open(f&#39;{filename}&#39;, &#39;r&#39;) as file:
        content = file.read()
    content_modified = content.replace(&#39;(\n,&#39;, &#39;(\n&#39;)
    with open(f&#39;{filename}&#39;, &#39;w&#39;) as file:
        file.write(content_modified)

答案1

得分: 4

在GNU awk中,只针对您提供的示例,请尝试以下awk代码。我将RS设置为(^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE,以满足要求,然后使用RT来获取所需的输出。

以下是用于理解的在线演示,请注意,在awk中,我们需要双重转义,但在正则表达式站点中,单个转义即可,所以正则表达式在这里看起来有点不同。

awk -v RS='(^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE' '
RT{
  sub(/^\n/,"",RT)
  sub(/,/,"",RT)
  print RT
}
' Input_file
英文:

In GNU awk with your shown samples only, please try following awk code. Where I am setting RS to (^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE as per requirement and then playing around with RT to get the required output.

Here is the Online Demo for used regex for understanding purposes, note awk we need to double escape but in regex site single escape works, so that's why regex looks little different here.

awk -v RS=&#39;(^|\n)OPEN\\(\n,*[^\n]*\n[^)]*\\) *CLOSE&#39; &#39;
RT{
  sub(/^\n/,&quot;&quot;,RT)
  sub(/,/,&quot;&quot;,RT)
  print RT
}
&#39; Input_file

答案2

得分: 2

Not very efficient but it those work. (assuming a file named "test.txt" in the same directory as the .py file)

with open("test.txt") as f:
    lines = f.readlines()

newLines = []
flag = False
for index, line in enumerate(lines):
    if(flag):
        if(lines[index+1] != ') CLOSE\n'):
            line = line.strip(',')
        flag = False
    if(line == "OPEN(\n"):
        flag = True
    newLines.append(line)
print(newLines)
with open("test.txt", 'w') as f:
    for line in newLines:
        f.write(line)
英文:

Not very efficient but it those work. (assuming a file named "test.txt" in the same directory as the .py file)

with open(&quot;test.txt&quot;) as f:
    lines = f.readlines()

newLines = []
flag = False
for index, line in enumerate(lines):
    if(flag):
        if(lines[index+1] != &#39;) CLOSE\n&#39;):
            line = line.strip(&#39;,&#39;)
        flag = False
    if(line == &quot;OPEN(\n&quot;):
        flag = True
    newLines.append(line)
print(newLines)
with open(&quot;test.txt&quot;, &#39;w&#39;) as f:
    for line in newLines:
        f.write(line)

答案3

得分: 2

这可能对您有用(GNU sed):

sed '/OPEN/{:a;N;/CLOSE/!ba;s/,//}' file

匹配字符串 OPEN,然后收集更多行直到找到匹配的 CLOSE。然后将第一个,替换为空白。

注意:如果没有关闭字符串,将在遇到文件末尾的读取时读取剩余部分,然后打印,即N命令在 $ 地址之后执行。

英文:

This might work for you (GNU sed):

sed &#39;/OPEN/{:a;N;/CLOSE/!ba;s/,//}&#39; file

Match the string OPEN then gather up further lines until one matching CLOSE. Then substitute the first , with nothing.

N.B. If there is no closing string, the remainder of the file will be read then printed once a read on the end-of-file is encounter i.e. the N command is enacted after the $ address.

答案4

得分: 1

下面是你要的翻译:

# 简单的Python脚本,带有解释性注释。

# 'delete_key.txt' - 输入数据文件
# 'delete_key_formatted.txt' - 带有格式化数据的输出文件

def format_data():
    char_remove = ","
    all_chars = []  # 最终列表
    with open("delete_key.txt") as f:
        cleaned = f.readlines()
        is_open = False
        chars_in_between = []  # 用于存储每个迭代(从打开到关闭或从打开到打开的情况)
        for clean in cleaned:
            if "OPEN" in clean:
                if is_open:  # 如果我们在没有关闭的情况下到达开启
                    all_chars.extend(chars_in_between)
                    chars_in_between = []
                is_open = True

            '''
            (可选)我们可以添加(如果is_open:)条件来跳过所有没有开启的行
            '''
            chars_in_between.append(clean)  # 收集所有行

            if "CLOSE" in clean and is_open:  # 如果我们在开启后到达关闭
                is_open = False
                chars_in_between[1] = chars_in_between[1][1:] if chars_in_between[1][0] == char_remove else chars_in_between[1]  # 删除每个'OPEN('后的字符','但仅在下一个') CLOSE'之前出现之前
                all_chars.extend(chars_in_between)  # 追加到当前迭代的最终列表
                chars_in_between = []  # 重置临时当前迭代
        if len(chars_in_between) > 0:  # 如果我们在没有关闭的情况下到达文件结尾
            all_chars.extend(chars_in_between)

    with open("delete_key_formatted.txt", "w") as f:
        f.writelines(all_chars)


if __name__ == "__main__":
    format_data()
英文:

Simple python script with comments for explanation.

delete_key.txt - input data file

delete_key_formatted.txt - output file with the formatted data

def format_data():
    char_remove = &quot;,&quot;
    all_chars = []  # final list
    with open(&quot;delete_key.txt&quot;) as f:
        cleaned = f.readlines()
        is_open = False
        chars_in_between = []  # temp to store each iterate (open to close or open to open)
        for clean in cleaned:
            if &quot;OPEN&quot; in clean:
                if is_open:  # if we have reached open without close
                    all_chars.extend(chars_in_between)
                    chars_in_between = []
                is_open = True

            &#39;&#39;&#39;
            (Optional)We can add (if is_open:) condition to skip all the lines that come without open
            &#39;&#39;&#39;
            chars_in_between.append(clean)  # collecting all lines

            if &quot;CLOSE&quot; in clean and is_open:  # if we reached close after hitting open
                is_open = False
                chars_in_between[1] = chars_in_between[1][1:] if chars_in_between[1][0] == char_remove else chars_in_between[1]  # remove the first occurrence of the character &#39;,&#39; following each &#39;OPEN(&#39; but only if it occurs before the next &#39;) CLOSE&#39;
                all_chars.extend(chars_in_between)  # appending to the current iterate to final list
                chars_in_between = []  # reset the temp current iterate
        if len(chars_in_between) &gt; 0:  # if we reached end of file without close
            all_chars.extend(chars_in_between)

    with open(&quot;delete_key_formatted.txt&quot;, &quot;w&quot;) as f:
        f.writelines(all_chars)


if __name__ == &quot;__main__&quot;:
    format_data()

答案5

得分: 1

你可以使用正则表达式和零长度断言来实现,假设 file.txt 的内容如下:

OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE

然后使用以下代码:

import re
with open('file.txt', 'r') as f:
    text = f.read()
with open('file.txt', 'w') as f:
    f.write(re.sub(r'(?<=\(\n),(?=.*CLOSE)', '', text, flags=re.DOTALL))

这将把 file.txt 的内容更改为:

OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE

解释:首先将整个文件的内容读入变量 text,然后使用正则表达式替换所有在括号后换行之后(这是正向后查找)且在零个或多个任意字符后跟随 CLOSE(这是正向前查找)的逗号,并包括换行符在内的任何字符(re.DOTALL),然后将结果写回同名文件。

如果需要更多关于零长度断言的信息,请参考 re 文档

(在 Python 3.10.6 中测试通过)

英文:

You might use regular expression with zero-length assertions following way, let file.txt content be

OPEN(
,a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
,d
,e
,f
) CLOSE

then

import re
with open(&#39;file.txt&#39;,&#39;r&#39;) as f:
    text = f.read()
with open(&#39;file.txt&#39;,&#39;w&#39;) as f:
    f.write(re.sub(r&#39;(?&lt;=\(\n),(?=.*CLOSE)&#39;, &#39;&#39;, text, flags=re.DOTALL))

do change file.txt content to

OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE

Explanation: read content of whole file into text, then replace all ,-s which are after bracket-newline (this is positive lookbehind) and before zero-or-more any character followed by CLOSE (this is positive lookahead) with any character including newline (re.DOTALL), then write result to file with same name.
Consult re docs if you want to know more about zero-length assertions available.

(tested in Python 3.10.6)

答案6

得分: 0

以下是翻译好的部分:

"...但仅在下一个')CLOSE'之前发生时..."

尽管如此,您可以使用以下内容来匹配您预期的结果。

(?s)(?<=OPEN\(.),

输出

OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE
英文:

To preface, I am misunderstanding the following condition.

> "... but only if it occurs before the next ') CLOSE' ..."

Nevertheless, you can use the following to match your expected result.

(?s)(?&lt;=OPEN\(.),

Output

OPEN(
a
,b
) CLOSE
OPEN(
c
) CLOSE
OPEN(
d
,e
,f
) CLOSE

huangapple
  • 本文由 发表于 2023年6月9日 03:41:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435215.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定