英文:
if the name exists in the Excel, check if a folder for that name exists, if true copy the Folder
问题
这是您提供的Python代码的翻译部分:
我已经在这个问题上工作了几天,但没有运气。我有一些文件夹在另一个文件夹中,我需要检查这些文件夹的名称是否出现在Excel表格中。如果是的话,我想将文件复制到另一个文件夹中。这似乎是一个容易的任务,但我对Python还比较陌生,所以我感到迷茫。
以下是Excel文件中名称列表的示例:
[mike nicolai Tesla, Thomas Edison, Edie Morphy, Josef mike Tesla Johanssen]
目录的示例,这是一个总体架构 R:\this folder\Folder\Subfolder\lastName, first Name
R:\this folder\docs\GERMANY\Tesla Johanssen, Josef mike
R:\this folder\docs\GERMANY\Morphy, Edie
R:\this folder\docs\France\nicolai Tesla, Josef mike
我已经创建了以下两个函数,以便根据需要复制或提取名称。由于中间名称,我必须检查函数是否返回的不仅仅是名字的第一个和最后一个部分。为此,我有两个函数 *sort_Names(_name)* 和 *extract_Names_From_Folders(folder_path)*
```python
import os
from glob import glob
import shutil
def extract_Names_From_Folders(folder_path):
folder_Name=folder_path.split(sep="\\")
folder_Name=folder_Name[-1::]
if len(''.join(map(str, folder_Name)).split(' '))>2:
_lName=''.join(map(str, folder_Name)).split(" ")[0]
_mName=''.join(map(str, folder_Name)).split(" ")[1]
_fName=''.join(map(str, folder_Name)).split(" ")[2]
return _fName,_mName,_lName
elif len(''.join(map(str, folder_Name)).split(' '))>1:
_lName=''.join(map(str, folder_Name)).split(" ")[0]
_fName=''.join(map(str, folder_Name)).split(" ")[1]
return _fName,_lName
elif len(''.join(map(str, folder_Name)).split(' '))>0:
_fName=''.join(map(str, folder_Name)).split(" ")[0]
return _fName
def sort_Names(_name):
if len(''.join(map(str, _name)).split(" "))==4:
excel_fName2=''.join(map(str, _name)).split(" ")[3]
excel_fName1=''.join(map(str, _name)).split(" ")[2]
excel_mName= ''.join(map(str, _name)).split(" ")[0]
excel_lName=''.join(map(str, _name)).split(" ")[1]
return excel_fName1,excel_fName2,excel_mName,excel_lName
elif len(''.join(map(str, _name)).split(" "))==3:
excel_fName=''.join(map(str, _name)).split(" ")[2]
excel_mName= ''.join(map(str, _name)).split(" ")[0]
excel_lName=''.join(map(str, _name)).split(" ")[1]
return excel_fName,excel_mName,excel_lName
elif len(''.join(map(str, _name)).split(" "))==2:
excel_fName=''.join(map(str, _name)).split(" ")[1]
excel_lName=''.join(map(str, _name)).split(" ")[0]
return excel_fName,excel_lName
elif len(''.join(map(str, _name)).split(" "))==1:
excel_fName=''.join(map(str, _name)).split(" ")[0]
return excel_fName
else: return [0]
def copy_the_file(folder_name,_name,df):
index = _name.index(folder_name)
df.at[_name, "User Request Form Created?\n[Yes/No]"] = "Yes"
# Copy the subfolder to the Proof folder
proof_folder_path = r"r:\blabla\Proof"
shutil.copytree(folder_name, os.path.join(proof_folder_path, _name), dirs_exist_ok=True)
以下是主要部分。此代码当前不起作用,我不知道为什么。它变得如此复杂,如此之快。我相当确定可以以更简单和更Pythonic的方式完成。但我不知道如何做。
import pandas as pd
from glob import glob
# 设置文件路径
excel_file_path = r"k:\blabla\path.xlsx"
folder_path = r"r:\blabla\path\*\*&"
df = pd.read_excel(excel_file_path, header=1)
# 从“Name”列中获取名称列表
names = df["Name"].tolist()
for folder_name in glob(folder_path, recursive=True):
# 检查函数是否返回3个或2个变量。函数返回一个元组
mName=''
tuple_=extract_Names_From_Folders(folder_name)
if len(tuple_) == 3:
fName = tuple_[0] if tuple_ else None
mName= tuple_[1] if len(tuple_) > 1 else None
lName= tuple_[2] if len(tuple_) > 2 else None
elif len(tuple_) == 2:
fName = tuple_[0] if tuple_ else None
lName= tuple_[1] if len(tuple_) > 1 else None
elif len(tuple_) == 1:
fName = tuple_[0] if tuple_ else None
for _name in names:
if len(_name)>2:
tuple_=sort_Names(_name)
if len(tuple_) == 4:
excel_fName2=tuple_[1] if tuple_ else None
excel_fName1=tuple_[0] if tuple_ else None
excel_mName= tuple_[2] if tuple_ else None
excel_lName=tuple_[3] if tuple_ else None
print(excel_fName1+' '+excel_fName2,excel_mName+' '+excel_lName)
# 检查子文件夹是否与Excel文件中的名称匹配
if excel_fName1+' '+excel_fName2 == fName and excel_mName+" "+excel_lName == lName:
copy_the_file(folder_name,_name,df)
if len(tuple_) == 3:
excel_fName=tuple_[0] if tuple_ else None
excel_mName= tuple_[1] if tuple_ else None
excel_lName=tuple_[2] if tuple_ else None
print(excel_f
<details>
<summary>英文:</summary>
I've been working on this for days but no luck. I have some folder in another folder and I need to check if the Name of these folders appear on a Excel sheet. If so I want to copy the files to another folder. It seems easy job to be, but I'm kind of new to python so I'm lost.
An Example of the list of names in excel file:
[mike nicolai Tesla, Thomas Edison, Edie Morphy, Josef mike Tesla Johanssen]
an example of the directories, this is a over all schema R:\this folder\Folder\Subfolder\lastName, first Name
R:\this folder\docs\GERMANY\Tesla Johanssen, Josef mike
R:\this folder\docs\GERMANY\Morphy, Edie
R:\this folder\docs\France\nicolai Tesla, Josef mike
I have made these Functions, in order to Copy or extract the names accordingly. Due to the middle names, I had to check if the Function returns more than just First and Last Name. I have 2 functions for that purpose *sort_Names(_name)* & *extract_Names_From_Folders(folder_path)*
import os
from glob import glob
import shutil
def extract_Names_From_Folders(folder_path):
folder_Name=folder_path.split(sep="\\")
folder_Name=folder_Name[-1::]
if len(''.join(map(str, folder_Name)).split(' '))>2:
_lName=''.join(map(str, folder_Name)).split(" ")[0]
_mName=''.join(map(str, folder_Name)).split(" ")[1]
_fName=''.join(map(str, folder_Name)).split(" ")[2]
return _fName,_mName,_lName
elif len(''.join(map(str, folder_Name)).split(' '))>1:
_lName=''.join(map(str, folder_Name)).split(" ")[0]
_fName=''.join(map(str, folder_Name)).split(" ")[1]
return _fName,_lName
elif len(''.join(map(str, folder_Name)).split(' '))>0:
_fName=''.join(map(str, folder_Name)).split(" ")[0]
return _fName
def sort_Names(_name):
if len(''.join(map(str, _name)).split(" "))==4:
excel_fName2=''.join(map(str, _name)).split(" ")[3]
excel_fName1=''.join(map(str, _name)).split(" ")[2]
excel_mName= ''.join(map(str, _name)).split(" ")[0]
excel_lName=''.join(map(str, _name)).split(" ")[1]
return excel_fName1,excel_fName2,excel_mName,excel_lName
elif len(''.join(map(str, _name)).split(" "))==3:
excel_fName=''.join(map(str, _name)).split(" ")[2]
excel_mName= ''.join(map(str, _name)).split(" ")[0]
excel_lName=''.join(map(str, _name)).split(" ")[1]
return excel_fName,excel_mName,excel_lName
elif len(''.join(map(str, _name)).split(" "))==2:
excel_fName=''.join(map(str, _name)).split(" ")[1]
excel_lName=''.join(map(str, _name)).split(" ")[0]
return excel_fName,excel_lName
elif len(''.join(map(str, _name)).split(" "))==1:
excel_fName=''.join(map(str, _name)).split(" ")[0]
return excel_fName
else: return [0]
def copy_the_file(folder_name,_name,df):
index = _name.index(folder_name)
df.at[_name, "User Request Form Created?\n[Yes/No]"] = "Yes"
# Copy the subfolder to the Proof folder
proof_folder_path = r"r:\blabla\Proof"
shutil.copytree(folder_name, os.path.join(proof_folder_path, _name), dirs_exist_ok=True)
Here's the main part. This code doesn't currently work I have no idea *why*. It got complex so fast. I'm pretty sure it can be done in easier and more pythonic way. But have no clue how.
import pandas as pd
from glob import glob
# Set up file paths
excel_file_path = r"k:\blabla\path.xlsx"
folder_path = r"r:\blabla\path\*\*"
df = pd.read_excel(excel_file_path, header=1)
# Get the list of names from the "Name" column
names = df["Name"].tolist()
for folder_name in glob(folder_path, recursive = True):
#Check if function returns 3 or 2 variables. A function returns a Tuple
mName=''
tuple_=extract_Names_From_Folders(folder_name)
if len(tuple_) == 3:
fName = tuple_[0] if tuple_ else None
mName= tuple_[1] if len(tuple_) > 1 else None
lName= tuple_[2] if len(tuple_) > 2 else None
elif len(tuple_) == 2:
fName = tuple_[0] if tuple_ else None
lName= tuple_[1] if len(tuple_) > 1 else None
elif len(tuple_) == 1:
fName = tuple_[0] if tuple_ else None
#convert list to string then split by comma
# print(''.join(map(str, folder_Name)).split(',')[0])
for _name in names:
if len(_name)>2:
tuple_=sort_Names(_name)
if len(tuple_) == 4:
excel_fName2=tuple_[1] if tuple_ else None
excel_fName1=tuple_[0] if tuple_ else None
excel_mName= tuple_[2] if tuple_ else None
excel_lName=tuple_[3] if tuple_ else None
print(excel_fName1+' '+excel_fName2,excel_mName+' '+excel_lName)
# Check if the subfolder matches a name in the Excel file
if excel_fName1+' '+excel_fName2 == fName and excel_mName+" "+excel_lName == lName:
copy_the_file(folder_name,_name,df)
if len(tuple_) == 3:
excel_fName=tuple_[0] if tuple_ else None
excel_mName= tuple_[1] if tuple_ else None
excel_lName=tuple_[2] if tuple_ else None
print(excel_fName+' '+excel_mName,excel_lName)
if excel_fName == fName and excel_mName+" "+excel_lName == lName:
copy_the_file(folder_name,_name,df)
elif len(tuple_) == 2:
excel_fName = tuple_[0] if tuple_ else None
excel_lName = tuple_[1] if len(tuple_) > 1 else None
print(excel_fName +' '+excel_lName)
if excel_fName==fName and excel_lName == lName:
copy_the_file(folder_name,_name,df)
elif len(tuple_) == 1:
fName = tuple_[0] if tuple_ else None
</details>
# 答案1
**得分**: 1
你可以使用 `difflib` 模块来找到相似的名称。通过在 `.get_close_matches` 方法中匹配,你可以尝试找到最接近的值。对于你提供的数据,`cutoff=.5` 是一个合适的值。如果这不起作用,你可以对数据框中的单词进行排序,将它们转换为相同的大小写,以使比较更准确。如果这仍然不起作用,你可以使用第三方库来比较字符串。
<details>
<summary>英文:</summary>
You can use the `difflib` module to find similar names. By matching in the `.get_close_matches` method, you can try to find the closest value. For the data you gave, `cutoff=.5` came up. If it doesn't work out, you can sort the words in the dataframe lines, convert them to the same case, so that the comparisons are more accurate. If this fails, you can use third-party libraries to compare strings.
import pandas as pd
import glob
import shutil
import difflib
excel_file_path = r'test.xlsx'
folder_path = r'F:\this folder\docs'
proof_folder_path = r'F:\this folder\Proof'
df_excel = pd.read_excel(excel_file_path, names=['name'], dtype='string')
pathes = glob.glob(rf'{folder_path}**')
We create lists, one with a partial path to the folders, the second with their names.
part_path = [part for path in pathes for part in path.split(sep=r"\docs")[-1::]]
folder_name = [name for path in pathes for name in path.split(sep="\")[-1::]]
We collect everything in another dataframe.
dict_folder = {'name': folder_name, 'path': pathes, 'part_path': part_path}
df_folder = pd.DataFrame(dict_folder, dtype='string')
We start a cycle of checking folder names and overwriting the necessary ones.
for name in df_excel['name']:
# You may have to match cutoff=
, so for data that is 0.4 selected and unnecessary, 0.6 did not select any.
find_it = difflib.get_close_matches(name, df_folder['name'], n=1, cutoff=.5)
# If the name matches, overwrite the subfolder in a new folder.
if find_it:
df_path = df_folder.loc[df_folder['name'] == find_it[0]]
shutil.copytree(df_path['path'].values[0],
f"{proof_folder_path}{df_path['part_path'].values[0]}",
dirs_exist_ok=True)
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论