Python代码未正确根据关键词将文档分隔到文件夹中。

huangapple go评论65阅读模式
英文:

Python code not correctly segregating documents into folders based on keywords

问题

以下是您要翻译的内容:

我正在尝试根据文档中是否出现某些关键词(keyword1keyword2)来将文档分类到不同的文件夹中。我正在使用正则表达式来实现这个目的。

情况1:如果出现keyword1,则创建一个名为keyword1的文件夹并将该文档存储在其中
情况2:如果出现keyword2,则创建一个名为keyword2的文件夹并将该文档存储在其中
情况3:如果两个关键词都不出现,则创建一个名为unknown的文件夹并将这些文档存储在其中。

逻辑在前两种情况下正常工作,但在最后一种情况下不起作用。即使两个关键词都不出现,文档仍然存储在keyword2文件夹中。

以下是我的Python实现:

keyword = "keyword1"
for k, text_list in text_dict.items():
    file_name1 = k.split('.')[0]
    match = re.search("keyword1", text_list, flags=re.DOTALL|re.IGNORECASE)

    if match:
        print(f"The keyword '{keyword}' is present in the text.--->", k)
        os.makedirs('keyword1', exist_ok=True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)

    elif not match:
        print(f"The keyword '{keyword2}' is present in the text.--->", k)
        os.makedirs('keyword2', exist_ok=True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)

    else:
        print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
        os.makedirs('unknown', exist_ok=True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)

text_list是一个字典,其中键是文件名,值是文件中的文本。基本上,它会遍历字典并在值中搜索关键词。如果找到,即matchTrue,则会创建一个以该名称命名的文件夹并将文件存储在其中。

问题出在最后的else语句中,如果两个关键词都未找到,那么应该创建一个unknown文件夹并将这些文件存储在其中。但这些文件实际上存储在keyword2文件夹中。

欢迎任何帮助!

英文:

I am trying to segregate documents into different folders based on whether some keywords(keyword1 and keyword2) occur in the text present in the document or not. I am using regex for this purpose.

Case 1 : If keyword1 occurs then create a folder named keyword1 and store that document in it
Case 2 : If keyword2 occurs then create a folder named keyword2 and store that document in it
Case 3 : If neither of the keyword occurs then create an unknown folder and store those documents in it.

The logic is working fine for the first 2 cases but it is not working for the last case. If neither of the keywords appear even then the documents are getting stored in the keyword2 folder.

Below is my python implementation:

keyword = "keyword1"
for k, text_list in text_dict.items():
    file_name1 = k.split('.')[0]
    match = re.search(r"keyword1", text_list, flags = re.DOTALL|re.IGNORECASE)

    if match:
        print(f"The keyword '{keyword}' is present in the text.--->", k)
        os.makedirs('keyword1', exist_ok = True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)

    elif not match:
        print(f"The keyword '{keyword2}' is present in the text.--->", k)
        os.makedirs('keyword2', exist_ok = True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)

    else:
        print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
        os.makedirs('unknown', exist_ok = True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)

text_list is a dictionary where the keys are the filename and the values are the text present in the file. Basically it will iterate through the dictionary and search for they keywords in the values. If found i.e. match is True then it will create a folder of that name and store the file in it.

The issue is in the last else, if neither of the keywords are found then an unknown folder should be created and those files should be stored in that folder. But those files are being stored in the keyword2 folder.

Any help is appreciated!

答案1

得分: 0

你当前实施的逻辑无法工作,因为如评论中指出的那样,只会有2种情况,即match1为True或False。因此,你的第三个else循环永远不会被满足。

你可以创建两个正则表达式分别用于keyword1keyword2。然后你可以检查条件是否得到满足。

试试这个:

    keyword1 = "keyword1"
    keyword2 = "keyword2"
    for k, text_list in text_dict.items():
        file_name1 = k.split('.')[0]
        match1 = re.search(keyword1, text_list, flags=re.DOTALL | re.IGNORECASE)
        match2 = re.search(keyword2, text_list, flags=re.DOTALL | re.IGNORECASE)

        if match1 and not match2:
            print(f"The keyword '{keyword1}' is present in the text.--->", k)
            os.makedirs('keyword1', exist_ok=True)
            shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)

        elif not match1 and match2:
            print(f"The keyword '{keyword2}' is present in the text.--->", k)
            os.makedirs('keyword2', exist_ok=True)
            shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)

        elif not match1 and not match2:
            print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
            os.makedirs('unknown', exist_ok=True)
            shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)

        else:
            print('whatever')

祝好运!

英文:

Your current implemented logic cannot work because as pointed out in the comments, there would only be 2 cases, whether match1 is True or False. Hence your 3rd else loop is never getting satisfied.

What you can do is create 2 regex expressions for keyword1 and keyword2. Then you can check whether the conditions are getting satisfied.

Try this:

keyword = "keyword1"
keyword = "keyword2"
for k, text_list in text_dict.items():
    file_name1 = k.split('.')[0]
    match1 = re.search(r"keyword1", text_list, flags = re.DOTALL|re.IGNORECASE)
    match2 = re.search(r"keyword2", text_list, flags = re.DOTALL|re.IGNORECASE)

    if (match1) and (not match2):
        print(f"The keyword '{keyword}' is present in the text.--->", k)
        os.makedirs('keyword1', exist_ok = True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)

    elif (not match1) and (match2):
        print(f"The keyword '{keyword2}' is present in the text.--->", k)
        os.makedirs('keyword2', exist_ok = True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)

    elif (not match1) and (not match2):
        print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
        os.makedirs('unknown', exist_ok = True)
        shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)

    else:
        print('whatever')

Cheers!

huangapple
  • 本文由 发表于 2023年5月22日 17:05:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304557.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定