尝试在Python中用新的“names”替换旧的“names”。

huangapple go评论69阅读模式
英文:

Trying to replace old "names" with new "names" in Python

问题

我有两个Excel文件。其中一个Excel文件中以随机顺序包含新旧名称,类似于以下内容:

列 A          列 B
apple-er3    horse-er45
alk-er43     erk-bf43

这样一直持续下去,大约有400行。

另一个文件按特定顺序包含旧名称:

列 A         列 B
*旧名称*     *旧名称*

我想要能够将其中所有旧名称翻译为新名称。

我考虑过创建一个字典,但是当需要翻译这么多名称时,我无法想出一个好的编写方法。我想要进入包含旧名称的Excel文件,逐列复制,然后将其粘贴到Python程序中,以便将其全部翻译为新名称。

英文:

I have two Excel files. One of the Excel files has the new and old names in it in a random order, something like this:

Colum A        Colum B
apple-er3      horse-er45
alk-er43       erk-bf43

and it continues for around 400 rows.

The other is filled with the old names in a specific order:

colum A         colum B
*old name*       *old name*

I want to be able to translate all the old names in it with the new.

I was thinking about making a dictionary but I can't figure out a good way of writing it when there are so many names to translate. I would like to go into the Excel with the old names, copy one colum at the time and then paste it into the Python program for it to translate it all to the new names.

答案1

得分: 0

给定一个类似以下的CSV文件:

apple-er3,horse-er45
alk-er43,erk-bf43

你可以轻松地构建一个将这两者之间进行翻译的字典:

import csv

with open('names.csv') as f:
    name_trans = dict(csv.reader(f))

dict() 会接受一对一对的元素(这就是一个2列CSV文件的csv.reader会提供的内容),并将其转换为一个字典,其中第一个元素是键,第二个元素是值,因此上面这两行代码就足以构建你所需的名称翻译字典。

如果CSV文件有一个需要跳过的标题行,或者有其他需要忽略的列,那些都是只需要再加上一两行代码的修改;基本策略是一样的。

一旦你有了 name_trans,你可以这样做:

new_name = name_trans[old_name]

来将任何 old_name 翻译成对应的 new_name

英文:

Given a CSV file that looks something like:

apple-er3,horse-er45
alk-er43,erk-bf43

you can build a dictionary that translates between the two very easily:

import csv

with open('names.csv') as f:
    name_trans = dict(csv.reader(f))

dict() will take an iterable of pairs (which is what a csv.reader of a 2-column CSV file will give you) and turn it into a dictionary where the first item is the key and the second item is the value, so the above two lines of code is all you need to build the name translation dictionary you want.

If the CSV has a header you need to skip, or if it has other columns you need to ignore, those are modifications that only represent another line or two of code; the basic strategy is the same.

Once you have name_trans you can do:

new_name = name_trans[old_name]

to translate any old_name to its corresponding new_name.

答案2

得分: 0

以下是翻译好的部分:

  • 我发现 pandas 对于从电子表格加载数据和保存数据非常方便。您可以使用 read_excel 从第一个文件中读取新旧名称并将它们转换成一个字典 (tRef_dict),如下所示。

<sub>[注意:如果您的文件是CSV格式,请使用 read_csv 而不是 read_excel。]</sub&gt

import pandas as pd

tRef_fp = 'map_old_new.xlsx' ## 包含新旧名称的文件路径
tRef_dict = pd.read_excel(tRef_fp).to_dict('split')
tRef_dict = {k:v for v,k,*_ in tRef_dict['data']} # 列A->新, 列B->旧
# tRef_dict = dict(tRef_dict['data']) # 只有2列, 列A->旧, 列B->新
  • 如果工作表没有标题行,使用 pd.read_excel(tRef_fp, header=None).to_dict('split')(否则,第一行将不包括在 tRef_dict['data'] 中)。
  • 对于您的示例数据,dict(tRef_dict['data']) 将返回
    {'apple-er3': 'horse-er45', 'alk-er43': 'erk-bf43'}
    # 'apple-er3' 将被翻译为 'horse-er45'
    
  • {k:v for v,k,*_ in tRef_dict['data']} 将返回
    {'horse-er45': 'apple-er3', 'erk-bf43': 'alk-er43'}
    # 'horse-er45' 将被翻译为 'apple-er3'
    

然后,您可以加载仅包含旧名称的文件,使用 .applymaptRef_dict 来更新名称,然后使用 .to_excel 保存已更新的名称<sup>(或 .to_csv 保存为CSV文件)</sup> :

old_fp = 'old.xlsx' ## 包含旧名称的文件路径
new_fp = 'new.xlsx' ## [可以与 old_fp 相同]

names_df = pd.read_excel(old_fp) #, header=None) ## 如果没有标题行,请指定
names_df = names_df.applymap(lambda x: tRef_dict.get(x,x)) ## [更新名称]
names_df.to_excel(new_fp, index=False) #, header=False) ## [保存已更新的名称]
英文:

I find pandas quite handy for loading data from [and saving data to] spreadsheets. You could use read_excel to read the old and new names from the fist file and convert them into a dictionary (tRef_dict) as below.

<sub>[NOTE: If your file is a CSV, use read_csv instead of read_excel.]</sub>

import pandas as pd

tRef_fp = &#39;map_old_new.xlsx&#39; ## path to file with new&amp;old names
tRef_dict = pd.read_excel(tRef_fp).to_dict(&#39;split&#39;)
tRef_dict = {k:v for v,k,*_ in tRef_dict[&#39;data&#39;]} # colA-&gt;new, colB-&gt;old
# tRef_dict = dict(tRef_dict[&#39;data&#39;]) # only 2 cols, colA-&gt;old, colB-&gt;new
  • Use pd.read_excel(tRef_fp, header=None).to_dict(&#39;split&#39;) if the sheet doesn't have a header row (otherwise, the first row would not be included in tRef_dict[&#39;data&#39;]).
  • With your example data, dict(tRef_dict[&#39;data&#39;]) would return
    {&#39;apple-er3&#39;: &#39;horse-er45&#39;, &#39;alk-er43&#39;: &#39;erk-bf43&#39;}
    # &#39;apple-er3&#39; will get translated to  &#39;horse-er45&#39;
    
  • and {k:v for v,k,*_ in tRef_dict[&#39;data&#39;]} would return
    {&#39;horse-er45&#39;: &#39;apple-er3&#39;, &#39;erk-bf43&#39;: &#39;alk-er43&#39;}
    # &#39;horse-er45&#39; will get translated to  &#39;apple-er3&#39;
    

Then, you can load the file with only old names, use .applymap and tRef_dict to update the names, and then save the updated names with .to_excel <sup>(or .to_csv to save to a CSV file)</sup> :

old_fp = &#39;old.xlsx&#39; ## path to file with old names
new_fp = &#39;new.xlsx&#39; ## [can be the same as old_fp]

names_df = pd.read_excel(old_fp) #, header=None) ## if there is no header row, specify
names_df = names_df.applymap(lambda x: tRef_dict.get(x,x)) ## [update names]
names_df.to_excel(new_fp, index=False) #, header=False) ## [save updated]

huangapple
  • 本文由 发表于 2023年3月7日 05:15:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655920.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定