if the name exists in the Excel, check if a folder for that name exists, if true copy the Folder.

huangapple go评论57阅读模式
英文:

if the name exists in the Excel, check if a folder for that name exists, if true copy the Folder

问题

这是您提供的Python代码的翻译部分:

我已经在这个问题上工作了几天但没有运气我有一些文件夹在另一个文件夹中我需要检查这些文件夹的名称是否出现在Excel表格中如果是的话我想将文件复制到另一个文件夹中这似乎是一个容易的任务但我对Python还比较陌生所以我感到迷茫

以下是Excel文件中名称列表的示例
[mike nicolai Tesla, Thomas Edison, Edie Morphy, Josef mike Tesla Johanssen]

目录的示例这是一个总体架构 R:\this folder\Folder\Subfolder\lastName, first Name
R:\this folder\docs\GERMANY\Tesla Johanssen, Josef mike
R:\this folder\docs\GERMANY\Morphy, Edie
R:\this folder\docs\France\nicolai Tesla, Josef mike

我已经创建了以下两个函数以便根据需要复制或提取名称由于中间名称我必须检查函数是否返回的不仅仅是名字的第一个和最后一个部分为此我有两个函数 *sort_Names(_name)**extract_Names_From_Folders(folder_path)*

```python
import os
from glob import glob
import shutil

def extract_Names_From_Folders(folder_path):
    folder_Name=folder_path.split(sep="\\")
    folder_Name=folder_Name[-1::]
    if len(''.join(map(str, folder_Name)).split(' '))>2:
        _lName=''.join(map(str, folder_Name)).split(" ")[0]
        _mName=''.join(map(str, folder_Name)).split(" ")[1]
        _fName=''.join(map(str, folder_Name)).split(" ")[2]
        return _fName,_mName,_lName
    elif len(''.join(map(str, folder_Name)).split(' '))>1:
        _lName=''.join(map(str, folder_Name)).split(" ")[0]
        _fName=''.join(map(str, folder_Name)).split(" ")[1]
        return _fName,_lName
    elif len(''.join(map(str, folder_Name)).split(' '))>0:
        _fName=''.join(map(str, folder_Name)).split(" ")[0]
        return _fName

def sort_Names(_name):
    if len(''.join(map(str, _name)).split(" "))==4:
        excel_fName2=''.join(map(str, _name)).split(" ")[3]
        excel_fName1=''.join(map(str, _name)).split(" ")[2]
        excel_mName= ''.join(map(str, _name)).split(" ")[0]
        excel_lName=''.join(map(str, _name)).split(" ")[1]
        return excel_fName1,excel_fName2,excel_mName,excel_lName
    elif len(''.join(map(str, _name)).split(" "))==3:
        excel_fName=''.join(map(str, _name)).split(" ")[2]
        excel_mName= ''.join(map(str, _name)).split(" ")[0]
        excel_lName=''.join(map(str, _name)).split(" ")[1]
        return excel_fName,excel_mName,excel_lName
    elif len(''.join(map(str, _name)).split(" "))==2:
        excel_fName=''.join(map(str, _name)).split(" ")[1]
        excel_lName=''.join(map(str, _name)).split(" ")[0]
        return excel_fName,excel_lName
    elif len(''.join(map(str, _name)).split(" "))==1:
        excel_fName=''.join(map(str, _name)).split(" ")[0]
        return excel_fName
    else: return [0]

def copy_the_file(folder_name,_name,df):
    index = _name.index(folder_name)
    df.at[_name, "User Request Form Created?\n[Yes/No]"] = "Yes"
    # Copy the subfolder to the Proof folder
    proof_folder_path = r"r:\blabla\Proof"
    shutil.copytree(folder_name, os.path.join(proof_folder_path, _name), dirs_exist_ok=True)

以下是主要部分。此代码当前不起作用,我不知道为什么。它变得如此复杂,如此之快。我相当确定可以以更简单和更Pythonic的方式完成。但我不知道如何做。

import pandas as pd
from glob import glob

# 设置文件路径
excel_file_path = r"k:\blabla\path.xlsx"
folder_path = r"r:\blabla\path\*\*&"
df = pd.read_excel(excel_file_path, header=1)

# 从“Name”列中获取名称列表
names = df["Name"].tolist()
for folder_name in glob(folder_path, recursive=True):
    # 检查函数是否返回3个或2个变量。函数返回一个元组
    mName=''
    tuple_=extract_Names_From_Folders(folder_name)
    if len(tuple_) == 3:
        fName = tuple_[0] if tuple_ else None
        mName= tuple_[1] if len(tuple_) > 1 else None
        lName= tuple_[2] if len(tuple_) > 2 else None
    elif len(tuple_) == 2:
        fName = tuple_[0] if tuple_ else None
        lName= tuple_[1] if len(tuple_) > 1 else None
    elif len(tuple_) == 1:
        fName = tuple_[0] if tuple_ else None

    for _name in names:
        if len(_name)>2:
            tuple_=sort_Names(_name)
            if len(tuple_) == 4:
                excel_fName2=tuple_[1] if tuple_ else None
                excel_fName1=tuple_[0] if tuple_ else None
                excel_mName= tuple_[2] if tuple_ else None
                excel_lName=tuple_[3] if tuple_ else None
                print(excel_fName1+' '+excel_fName2,excel_mName+' '+excel_lName)
                # 检查子文件夹是否与Excel文件中的名称匹配
                if excel_fName1+' '+excel_fName2 == fName and excel_mName+" "+excel_lName == lName:
                    copy_the_file(folder_name,_name,df)
            if len(tuple_) == 3:
                excel_fName=tuple_[0] if tuple_ else None
                excel_mName= tuple_[1] if tuple_ else None
                excel_lName=tuple_[2] if tuple_ else None
                print(excel_f

<details>
<summary>英文:</summary>

I&#39;ve been working on this for days but no luck. I have some folder in another folder and I need to check if the Name of these folders appear on a Excel sheet. If so I want to copy the files to another folder. It seems easy job to be, but I&#39;m kind of new to python so I&#39;m lost. 

An Example of the list of names in excel file:

    [mike nicolai Tesla, Thomas Edison, Edie Morphy, Josef mike Tesla Johanssen]

an example of the directories, this is a over all schema R:\this folder\Folder\Subfolder\lastName, first Name

    R:\this folder\docs\GERMANY\Tesla Johanssen, Josef mike
    R:\this folder\docs\GERMANY\Morphy, Edie   
    R:\this folder\docs\France\nicolai Tesla, Josef mike

   


I have made these Functions, in order to Copy or extract the names accordingly. Due to the middle names, I had to check if the Function returns more than just First and Last Name. I have 2 functions for that purpose *sort_Names(_name)* &amp; *extract_Names_From_Folders(folder_path)*

    import os
    from glob import glob
    import shutil

    def extract_Names_From_Folders(folder_path):
        folder_Name=folder_path.split(sep=&quot;\\&quot;)
        folder_Name=folder_Name[-1::]
        if len(&#39;&#39;.join(map(str, folder_Name)).split(&#39; &#39;))&gt;2:
            _lName=&#39;&#39;.join(map(str, folder_Name)).split(&quot; &quot;)[0]
            _mName=&#39;&#39;.join(map(str, folder_Name)).split(&quot; &quot;)[1]
            _fName=&#39;&#39;.join(map(str, folder_Name)).split(&quot; &quot;)[2]
            return _fName,_mName,_lName
        elif len(&#39;&#39;.join(map(str, folder_Name)).split(&#39; &#39;))&gt;1:
            _lName=&#39;&#39;.join(map(str, folder_Name)).split(&quot; &quot;)[0]
            _fName=&#39;&#39;.join(map(str, folder_Name)).split(&quot; &quot;)[1]            
            return _fName,_lName
        elif len(&#39;&#39;.join(map(str, folder_Name)).split(&#39; &#39;))&gt;0:
            _fName=&#39;&#39;.join(map(str, folder_Name)).split(&quot; &quot;)[0]  
            return _fName
        
    def sort_Names(_name):
        if len(&#39;&#39;.join(map(str, _name)).split(&quot; &quot;))==4:  
            excel_fName2=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[3]
            excel_fName1=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[2]
            excel_mName= &#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[0]
            excel_lName=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[1]
            return excel_fName1,excel_fName2,excel_mName,excel_lName
        elif len(&#39;&#39;.join(map(str, _name)).split(&quot; &quot;))==3:  
            excel_fName=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[2]
            excel_mName= &#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[0]
            excel_lName=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[1]
            return excel_fName,excel_mName,excel_lName
        elif len(&#39;&#39;.join(map(str, _name)).split(&quot; &quot;))==2:
            excel_fName=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[1]        
            excel_lName=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[0]
            return excel_fName,excel_lName
        elif len(&#39;&#39;.join(map(str, _name)).split(&quot; &quot;))==1:
            excel_fName=&#39;&#39;.join(map(str, _name)).split(&quot; &quot;)[0]        
            return excel_fName
        else: return [0]
    
    
    def copy_the_file(folder_name,_name,df):    
        index = _name.index(folder_name)        
        df.at[_name, &quot;User Request Form Created?\n[Yes/No]&quot;] = &quot;Yes&quot;
        # Copy the subfolder to the Proof folder
        proof_folder_path = r&quot;r:\blabla\Proof&quot;
        shutil.copytree(folder_name, os.path.join(proof_folder_path, _name), dirs_exist_ok=True)


Here&#39;s the main part. This code doesn&#39;t currently work I have no idea *why*. It got complex so fast. I&#39;m pretty sure it can be done in easier and more pythonic way. But have no clue how. 

    import pandas as pd
    from glob import glob
    
    # Set up file paths
    excel_file_path = r&quot;k:\blabla\path.xlsx&quot;
    folder_path = r&quot;r:\blabla\path\*\*&quot;
    df = pd.read_excel(excel_file_path, header=1)
    
    # Get the list of names from the &quot;Name&quot; column
    names = df[&quot;Name&quot;].tolist()
    for folder_name in glob(folder_path, recursive = True):
        #Check if function returns 3 or 2 variables. A function returns a Tuple
        mName=&#39;&#39;
        tuple_=extract_Names_From_Folders(folder_name)
        if len(tuple_) == 3:
            fName = tuple_[0] if tuple_ else None
            mName= tuple_[1] if len(tuple_) &gt; 1 else None
            lName= tuple_[2] if len(tuple_) &gt; 2 else None
        elif len(tuple_) == 2:
            fName = tuple_[0] if tuple_ else None        
            lName= tuple_[1] if len(tuple_) &gt; 1 else None
        elif len(tuple_) == 1:
            fName = tuple_[0] if tuple_ else None
    
        #convert list to string then split by comma
        # print(&#39;&#39;.join(map(str, folder_Name)).split(&#39;,&#39;)[0])
        
        for _name in names: 
            if len(_name)&gt;2:
                tuple_=sort_Names(_name)
                if len(tuple_) == 4:
                    excel_fName2=tuple_[1] if tuple_ else None
                    excel_fName1=tuple_[0] if tuple_ else None
                    excel_mName= tuple_[2] if tuple_ else None
                    excel_lName=tuple_[3] if tuple_ else None
                    print(excel_fName1+&#39; &#39;+excel_fName2,excel_mName+&#39; &#39;+excel_lName)      
      # Check if the subfolder matches a name in the Excel file      
                    if excel_fName1+&#39; &#39;+excel_fName2 == fName and excel_mName+&quot; &quot;+excel_lName == lName:       
                        copy_the_file(folder_name,_name,df)
                if len(tuple_) == 3:            
                    excel_fName=tuple_[0] if tuple_ else None
                    excel_mName= tuple_[1] if tuple_ else None
                    excel_lName=tuple_[2] if tuple_ else None
                    print(excel_fName+&#39; &#39;+excel_mName,excel_lName)
                    if excel_fName == fName and excel_mName+&quot; &quot;+excel_lName == lName:       
                        copy_the_file(folder_name,_name,df)
                elif len(tuple_) == 2:
                    excel_fName = tuple_[0] if tuple_ else None        
                    excel_lName = tuple_[1] if len(tuple_) &gt; 1 else None
                    print(excel_fName +&#39; &#39;+excel_lName)            
                    if excel_fName==fName and excel_lName == lName:
                        copy_the_file(folder_name,_name,df)
                elif len(tuple_) == 1:
                    fName = tuple_[0] if tuple_ else None


</details>


# 答案1
**得分**: 1

你可以使用 `difflib` 模块来找到相似的名称通过在 `.get_close_matches` 方法中匹配你可以尝试找到最接近的值对于你提供的数据,`cutoff=.5` 是一个合适的值如果这不起作用你可以对数据框中的单词进行排序将它们转换为相同的大小写以使比较更准确如果这仍然不起作用你可以使用第三方库来比较字符串

<details>
<summary>英文:</summary>

You can use the `difflib` module to find similar names. By matching in the `.get_close_matches` method, you can try to find the closest value. For the data you gave, `cutoff=.5` came up. If it doesn&#39;t work out, you can sort the words in the dataframe lines, convert them to the same case, so that the comparisons are more accurate. If this fails, you can use third-party libraries to compare strings.

import pandas as pd
import glob
import shutil
import difflib

excel_file_path = r'test.xlsx'
folder_path = r'F:\this folder\docs'
proof_folder_path = r'F:\this folder\Proof'

df_excel = pd.read_excel(excel_file_path, names=['name'], dtype='string')

pathes = glob.glob(rf'{folder_path}**')

We create lists, one with a partial path to the folders, the second with their names.

part_path = [part for path in pathes for part in path.split(sep=r"\docs")[-1::]]
folder_name = [name for path in pathes for name in path.split(sep="\")[-1::]]

We collect everything in another dataframe.

dict_folder = {'name': folder_name, 'path': pathes, 'part_path': part_path}
df_folder = pd.DataFrame(dict_folder, dtype='string')

We start a cycle of checking folder names and overwriting the necessary ones.

for name in df_excel['name']:
# You may have to match cutoff=, so for data that is 0.4 selected and unnecessary, 0.6 did not select any.
find_it = difflib.get_close_matches(name, df_folder['name'], n=1, cutoff=.5)

# If the name matches, overwrite the subfolder in a new folder.
if find_it:
df_path = df_folder.loc[df_folder[&#39;name&#39;] == find_it[0]]
shutil.copytree(df_path[&#39;path&#39;].values[0],
f&quot;{proof_folder_path}{df_path[&#39;part_path&#39;].values[0]}&quot;,
dirs_exist_ok=True)

</details>

huangapple
  • 本文由 发表于 2023年5月7日 22:45:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76194609.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定