英文:
Python code not correctly segregating documents into folders based on keywords
问题
以下是您要翻译的内容:
我正在尝试根据文档中是否出现某些关键词(keyword1
和keyword2
)来将文档分类到不同的文件夹中。我正在使用正则表达式来实现这个目的。
情况1:如果出现keyword1
,则创建一个名为keyword1
的文件夹并将该文档存储在其中
情况2:如果出现keyword2
,则创建一个名为keyword2
的文件夹并将该文档存储在其中
情况3:如果两个关键词都不出现,则创建一个名为unknown
的文件夹并将这些文档存储在其中。
逻辑在前两种情况下正常工作,但在最后一种情况下不起作用。即使两个关键词都不出现,文档仍然存储在keyword2
文件夹中。
以下是我的Python实现:
keyword = "keyword1"
for k, text_list in text_dict.items():
file_name1 = k.split('.')[0]
match = re.search("keyword1", text_list, flags=re.DOTALL|re.IGNORECASE)
if match:
print(f"The keyword '{keyword}' is present in the text.--->", k)
os.makedirs('keyword1', exist_ok=True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)
elif not match:
print(f"The keyword '{keyword2}' is present in the text.--->", k)
os.makedirs('keyword2', exist_ok=True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)
else:
print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
os.makedirs('unknown', exist_ok=True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)
text_list
是一个字典,其中键是文件名,值是文件中的文本。基本上,它会遍历字典并在值中搜索关键词。如果找到,即match
为True
,则会创建一个以该名称命名的文件夹并将文件存储在其中。
问题出在最后的else
语句中,如果两个关键词都未找到,那么应该创建一个unknown
文件夹并将这些文件存储在其中。但这些文件实际上存储在keyword2
文件夹中。
欢迎任何帮助!
英文:
I am trying to segregate documents into different folders based on whether some keywords(keyword1
and keyword2
) occur in the text present in the document or not. I am using regex for this purpose.
Case 1 : If keyword1
occurs then create a folder named keyword1
and store that document in it
Case 2 : If keyword2
occurs then create a folder named keyword2
and store that document in it
Case 3 : If neither of the keyword occurs then create an unknown
folder and store those documents in it.
The logic is working fine for the first 2 cases but it is not working for the last case. If neither of the keywords appear even then the documents are getting stored in the keyword2
folder.
Below is my python implementation:
keyword = "keyword1"
for k, text_list in text_dict.items():
file_name1 = k.split('.')[0]
match = re.search(r"keyword1", text_list, flags = re.DOTALL|re.IGNORECASE)
if match:
print(f"The keyword '{keyword}' is present in the text.--->", k)
os.makedirs('keyword1', exist_ok = True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)
elif not match:
print(f"The keyword '{keyword2}' is present in the text.--->", k)
os.makedirs('keyword2', exist_ok = True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)
else:
print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
os.makedirs('unknown', exist_ok = True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)
text_list
is a dictionary where the keys are the filename and the values are the text present in the file. Basically it will iterate through the dictionary and search for they keywords in the values. If found i.e. match
is True
then it will create a folder of that name and store the file in it.
The issue is in the last else, if neither of the keywords are found then an unknown
folder should be created and those files should be stored in that folder. But those files are being stored in the keyword2
folder.
Any help is appreciated!
答案1
得分: 0
你当前实施的逻辑无法工作,因为如评论中指出的那样,只会有2种情况,即match1为True或False。因此,你的第三个else循环永远不会被满足。
你可以创建两个正则表达式分别用于keyword1
和keyword2
。然后你可以检查条件是否得到满足。
试试这个:
keyword1 = "keyword1"
keyword2 = "keyword2"
for k, text_list in text_dict.items():
file_name1 = k.split('.')[0]
match1 = re.search(keyword1, text_list, flags=re.DOTALL | re.IGNORECASE)
match2 = re.search(keyword2, text_list, flags=re.DOTALL | re.IGNORECASE)
if match1 and not match2:
print(f"The keyword '{keyword1}' is present in the text.--->", k)
os.makedirs('keyword1', exist_ok=True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)
elif not match1 and match2:
print(f"The keyword '{keyword2}' is present in the text.--->", k)
os.makedirs('keyword2', exist_ok=True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)
elif not match1 and not match2:
print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
os.makedirs('unknown', exist_ok=True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)
else:
print('whatever')
祝好运!
英文:
Your current implemented logic cannot work because as pointed out in the comments, there would only be 2 cases, whether match1 is True or False. Hence your 3rd else loop is never getting satisfied.
What you can do is create 2 regex expressions for keyword1
and keyword2
. Then you can check whether the conditions are getting satisfied.
Try this:
keyword = "keyword1"
keyword = "keyword2"
for k, text_list in text_dict.items():
file_name1 = k.split('.')[0]
match1 = re.search(r"keyword1", text_list, flags = re.DOTALL|re.IGNORECASE)
match2 = re.search(r"keyword2", text_list, flags = re.DOTALL|re.IGNORECASE)
if (match1) and (not match2):
print(f"The keyword '{keyword}' is present in the text.--->", k)
os.makedirs('keyword1', exist_ok = True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword1', file_name1), dirs_exist_ok=True)
elif (not match1) and (match2):
print(f"The keyword '{keyword2}' is present in the text.--->", k)
os.makedirs('keyword2', exist_ok = True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('keyword2', file_name1), dirs_exist_ok=True)
elif (not match1) and (not match2):
print(f"The keywords '{keyword1}' and {keyword2} are not present in the text.--->", k)
os.makedirs('unknown', exist_ok = True)
shutil.copytree(os.path.join('imgs', file_name1), os.path.join('unknown', file_name1), dirs_exist_ok=True)
else:
print('whatever')
Cheers!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论