Searching text in a file, replacing text and writing to new file in subdir, getting a doubling of replacement text when iterating

huangapple go评论65阅读模式
英文:

Searching text in a file, replacing text and writing to new file in subdir, getting a doubling of replacement text when iterating

问题

在单个文件中搜索文本并将其写入该文件时,它按预期行事。它在子目录“output”中创建一个新文件,包含现有文本“This”,并在下一行添加文本“And That”。
然而,当我遍历子目录中的所有文件时,我得到了两倍的新文本。我不明白为什么。以下是代码:

import os
import shutil
import pathlib

def replace_text_in_multiple_files(input_path, output_path):
    search_text = "This"
    new_text = "This\nAndThat"

    shutil.rmtree(output_path)
    os.mkdir(output_path)

    for subdir, dirs, files in os.walk(input_path):
        for file in files:
            input_file_path = subdir + os.sep + file
            output_file_path = output_path + os.sep + file
            if input_file_path.endswith(".txt"):
                s = pathlib.Path(input_file_path).read_text()
                s = s.replace(search_text, new_text)

                with open(output_file_path, "w") as f:
                    f.write(s)

def replace_text_in_a_single_files(input_file_path, output_file_path):
    search_text = "This"
    new_text = "This\nAndThat"

    s = pathlib.Path(input_file_path).read_text()
    s = s.replace(search_text, new_text)

    with open(output_file_path, "w") as f:
        f.write(s)

replace_text_in_multiple_files("D:\\Test\\", "D:\\Test\\output\\")
#replace_text_in_a_single_files("D:\\Test\\File1.txt", "D:\\Test\\output\\File1.txt")

在目录'D:\Test'中有3个文本文件。每个文本文件包含以下文本:

This
is 
a
test

如果在代码中运行'replace_text_in_a_single_files',它会打开File1.txt,搜索文本,用相同的文本加上值'And That'替换该文本,然后将其写入输出子目录中的新文件,结果如下:

This
And That
is 
a
test

然而,当我运行replace_text_in_multiple_files时,它做了同样的事情,只是对一堆文件而不是一个文件,每个新文件都会加倍替换文本,导致以下结果:

This
AndThat
AndThat
is 
a
test

所以,就像它执行了两次替换代码。但是为什么呢?而且为什么只有在迭代时才会发生呢?

我期望它只会在每个文件中生成以下文本。

This
AndThat
is 
a
test
英文:

When I search text in a single file and write out just that one file, it acts as expected. It creates a new file in the subdirectory "output", with the existing text "This", and the addition of the text "And That" on the next line.
However, when I am iterating through all the files in a sub-directory, I'm getting double the new text. I don't get why. Here is the code:

import os
import shutil
import pathlib


def replace_text_in_multiple_files(input_path, output_path):
    search_text = "This"
    new_text = "This\nAndThat"

    shutil.rmtree(output_path)
    os.mkdir(output_path)

    for subdir, dirs, files in os.walk(input_path):
        for file in files:
            input_file_path = subdir + os.sep + file
            output_file_path = output_path + os.sep + file
            if input_file_path.endswith(".txt"):
                s = pathlib.Path(input_file_path).read_text()
                s = s.replace(search_text, new_text)

                with open(output_file_path, "w") as f:
                    f.write(s)

def replace_text_in_a_single_files(input_file_path, output_file_path):
    search_text = "This"
    new_text = "This\nAndThat"

    s = pathlib.Path(input_file_path).read_text()
    s = s.replace(search_text, new_text)

    with open(output_file_path, "w") as f:
        f.write(s)

replace_text_in_multiple_files("D:\\Test\\", "D:\\Test\\output\\")
#replace_text_in_a_single_files("D:\\Test\\File1.txt", "D:\\Test\\output\\File1.txt")

In the directory 'D:\Test' I have 3 text files. Each of the text files contains the following text:

This
is 
a
test

If I run 'replace_text_in_a_single_files' in the code, it opens File1.txt, searches for the text, replaces that text with the same text plus the value 'And That', and then writes that out to a new file in the output subdirectory, which results in the following:

This
And That
is 
a
test

However, when I run replace_text_in_multiple_files which does the same thing, just to a bunch of files instead of just one, each of the new files gets a doubling of the replacement text, resulting in the following:

This
AndThat
AndThat
is 
a
test

So, it's like it's executing the replacement code twice. But why? And why only when it's iterating?

I was expecting that it would just produce the following text in each of the files.

This
AndThat
is 
a
test

答案1

得分: 0

你正在迭代输入文件以及你自己的输出文件:

import os
import shutil
import pathlib


def replace_text_in_multiple_files(input_path, output_path):
    search_text = "This"
    new_text = "This\nAndThat"

    shutil.rmtree(output_path)
    os.mkdir(output_path)

    for subdir, dirs, files in os.walk(input_path):
        for file in files:
            print(subdir, file)
            input_file_path = subdir + os.sep + file
            output_file_path = output_path + os.sep + file
            if input_file_path.endswith(".txt"):
                s = pathlib.Path(input_file_path).read_text()
                s = s.replace(search_text, new_text)

                with open(output_file_path, "w") as f:
                    f.write(s)

def replace_text_in_a_single_files(input_file_path, output_file_path):
    search_text = "This"
    new_text = "This\nAndThat"

    s = pathlib.Path(input_file_path).read_text()
    s = s.replace(search_text, new_text)

    with open(output_file_path, "w") as f:
        f.write(s)

replace_text_in_multiple_files("./Test", "./Test/output/")
./Test File3.txt
./Test File2.txt
./Test File1.txt
./Test/output File3.txt
./Test/output File2.txt
./Test/output File1.txt

你的脚本一旦“看到”输入文件夹中的文件就会写入每个输出文件,但然后os.walk会“发现”输出文件夹中具有相同名称的文件,并继续迭代这些文件。

英文:

You're iterating over the input files as well as your own output files:

import os
import shutil
import pathlib


def replace_text_in_multiple_files(input_path, output_path):
    search_text = "This"
    new_text = "This\nAndThat"

    shutil.rmtree(output_path)
    os.mkdir(output_path)

    for subdir, dirs, files in os.walk(input_path):
        for file in files:
            print(subdir, file)
            input_file_path = subdir + os.sep + file
            output_file_path = output_path + os.sep + file
            if input_file_path.endswith(".txt"):
                s = pathlib.Path(input_file_path).read_text()
                s = s.replace(search_text, new_text)

                with open(output_file_path, "w") as f:
                    f.write(s)

def replace_text_in_a_single_files(input_file_path, output_file_path):
    search_text = "This"
    new_text = "This\nAndThat"

    s = pathlib.Path(input_file_path).read_text()
    s = s.replace(search_text, new_text)

    with open(output_file_path, "w") as f:
        f.write(s)

replace_text_in_multiple_files("./Test", "./Test/output/")
./Test File3.txt
./Test File2.txt
./Test File1.txt
./Test/output File3.txt
./Test/output File2.txt
./Test/output File1.txt

Your script writes each output file once it "sees" a file in the input folder, but then os.walk "discovers" files with the same name in the output folder and proceeds to iterate over those.

huangapple
  • 本文由 发表于 2023年5月15日 04:02:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76249453.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定