英文:
Trying to replace old "names" with new "names" in Python
问题
我有两个Excel文件。其中一个Excel文件中以随机顺序包含新旧名称,类似于以下内容:
列 A 列 B
apple-er3 horse-er45
alk-er43 erk-bf43
这样一直持续下去,大约有400行。
另一个文件按特定顺序包含旧名称:
列 A 列 B
*旧名称* *旧名称*
我想要能够将其中所有旧名称翻译为新名称。
我考虑过创建一个字典,但是当需要翻译这么多名称时,我无法想出一个好的编写方法。我想要进入包含旧名称的Excel文件,逐列复制,然后将其粘贴到Python程序中,以便将其全部翻译为新名称。
英文:
I have two Excel files. One of the Excel files has the new and old names in it in a random order, something like this:
Colum A Colum B
apple-er3 horse-er45
alk-er43 erk-bf43
and it continues for around 400 rows.
The other is filled with the old names in a specific order:
colum A colum B
*old name* *old name*
I want to be able to translate all the old names in it with the new.
I was thinking about making a dictionary but I can't figure out a good way of writing it when there are so many names to translate. I would like to go into the Excel with the old names, copy one colum at the time and then paste it into the Python program for it to translate it all to the new names.
答案1
得分: 0
给定一个类似以下的CSV文件:
apple-er3,horse-er45
alk-er43,erk-bf43
你可以轻松地构建一个将这两者之间进行翻译的字典:
import csv
with open('names.csv') as f:
name_trans = dict(csv.reader(f))
dict()
会接受一对一对的元素(这就是一个2列CSV文件的csv.reader
会提供的内容),并将其转换为一个字典,其中第一个元素是键,第二个元素是值,因此上面这两行代码就足以构建你所需的名称翻译字典。
如果CSV文件有一个需要跳过的标题行,或者有其他需要忽略的列,那些都是只需要再加上一两行代码的修改;基本策略是一样的。
一旦你有了 name_trans
,你可以这样做:
new_name = name_trans[old_name]
来将任何 old_name
翻译成对应的 new_name
。
英文:
Given a CSV file that looks something like:
apple-er3,horse-er45
alk-er43,erk-bf43
you can build a dictionary that translates between the two very easily:
import csv
with open('names.csv') as f:
name_trans = dict(csv.reader(f))
dict()
will take an iterable of pairs (which is what a csv.reader
of a 2-column CSV file will give you) and turn it into a dictionary where the first item is the key and the second item is the value, so the above two lines of code is all you need to build the name translation dictionary you want.
If the CSV has a header you need to skip, or if it has other columns you need to ignore, those are modifications that only represent another line or two of code; the basic strategy is the same.
Once you have name_trans
you can do:
new_name = name_trans[old_name]
to translate any old_name
to its corresponding new_name
.
答案2
得分: 0
以下是翻译好的部分:
- 我发现
pandas
对于从电子表格加载数据和保存数据非常方便。您可以使用read_excel
从第一个文件中读取新旧名称并将它们转换成一个字典 (tRef_dict
),如下所示。
<sub>[注意:如果您的文件是CSV格式,请使用 read_csv
而不是 read_excel
。]</sub>
import pandas as pd
tRef_fp = 'map_old_new.xlsx' ## 包含新旧名称的文件路径
tRef_dict = pd.read_excel(tRef_fp).to_dict('split')
tRef_dict = {k:v for v,k,*_ in tRef_dict['data']} # 列A->新, 列B->旧
# tRef_dict = dict(tRef_dict['data']) # 只有2列, 列A->旧, 列B->新
- 如果工作表没有标题行,使用
pd.read_excel(tRef_fp, header=None).to_dict('split')
(否则,第一行将不包括在tRef_dict['data']
中)。 - 对于您的示例数据,
dict(tRef_dict['data'])
将返回{'apple-er3': 'horse-er45', 'alk-er43': 'erk-bf43'} # 'apple-er3' 将被翻译为 'horse-er45'
- 而
{k:v for v,k,*_ in tRef_dict['data']}
将返回{'horse-er45': 'apple-er3', 'erk-bf43': 'alk-er43'} # 'horse-er45' 将被翻译为 'apple-er3'
然后,您可以加载仅包含旧名称的文件,使用 .applymap
和 tRef_dict
来更新名称,然后使用 .to_excel
保存已更新的名称<sup>(或 .to_csv
保存为CSV文件)</sup> :
old_fp = 'old.xlsx' ## 包含旧名称的文件路径
new_fp = 'new.xlsx' ## [可以与 old_fp 相同]
names_df = pd.read_excel(old_fp) #, header=None) ## 如果没有标题行,请指定
names_df = names_df.applymap(lambda x: tRef_dict.get(x,x)) ## [更新名称]
names_df.to_excel(new_fp, index=False) #, header=False) ## [保存已更新的名称]
英文:
I find pandas
quite handy for loading data from [and saving data to] spreadsheets. You could use read_excel
to read the old and new names from the fist file and convert them into a dictionary (tRef_dict
) as below.
<sub>[NOTE: If your file is a CSV, use read_csv
instead of read_excel
.]</sub>
import pandas as pd
tRef_fp = 'map_old_new.xlsx' ## path to file with new&old names
tRef_dict = pd.read_excel(tRef_fp).to_dict('split')
tRef_dict = {k:v for v,k,*_ in tRef_dict['data']} # colA->new, colB->old
# tRef_dict = dict(tRef_dict['data']) # only 2 cols, colA->old, colB->new
- Use
pd.read_excel(tRef_fp, header=None).to_dict('split')
if the sheet doesn't have a header row (otherwise, the first row would not be included intRef_dict['data']
). - With your example data,
dict(tRef_dict['data'])
would return{'apple-er3': 'horse-er45', 'alk-er43': 'erk-bf43'} # 'apple-er3' will get translated to 'horse-er45'
- and
{k:v for v,k,*_ in tRef_dict['data']}
would return{'horse-er45': 'apple-er3', 'erk-bf43': 'alk-er43'} # 'horse-er45' will get translated to 'apple-er3'
Then, you can load the file with only old names, use .applymap
and tRef_dict
to update the names, and then save the updated names with .to_excel
<sup>(or .to_csv
to save to a CSV file)</sup> :
old_fp = 'old.xlsx' ## path to file with old names
new_fp = 'new.xlsx' ## [can be the same as old_fp]
names_df = pd.read_excel(old_fp) #, header=None) ## if there is no header row, specify
names_df = names_df.applymap(lambda x: tRef_dict.get(x,x)) ## [update names]
names_df.to_excel(new_fp, index=False) #, header=False) ## [save updated]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论