英文:
Undoing replacement with a dictionary in pandas dataframe
问题
Sure, here's the translated code part:
我有一个类似这样的pandas数据框:
x = pd.DataFrame({'col1':['one','two','three','four'],'col2':[5,6,7,8],'col3':[9,10,11,12]})
为了我的目的(训练一个机器学习模型),我需要用数字替换文本,所以我使用pd.replace()和一个字典来做这个替换:
mydict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)
之后,我训练模型并让它返回一个候选项,但是由于模型只看到了数字,它返回的候选项是在第一列中的数字,类似于这样:
col1 | col2 | col3 |
---|---|---|
1 | 5 | 9 |
我希望得到类似于这样的结果:
col1 | col2 | col3 |
---|---|---|
one | 5 | 9 |
我看到了这个问题,他们创建了一个反向字典来解决这个问题,还有这个问题关于如何获取python字典的值。但我想避免创建另一个字典,因为字典的值与键一样唯一。我有一种感觉,应该有一种简单的方法,可以像查找键一样查找值并进行替换,但我不确定。
英文:
I have a pandas dataframe like so:
x = pd.DataFrame({'col1':['one','two','three','four'],'col2':[5,6,7,8],'col3':[9,10,11,12]})
For my purposes (training a ml model, I need to replace the text with numbers, so I use pd.replace() with a dictionary to change that
mydict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)
After that, I train the model and have it return a proposed candidate, but the model, having seen only the numbers, returns the candidate as numbers in that first column, something like this
col1 | col2 | col3 |
---|---|---|
1 | 5 | 9 |
Where I'd like to get something like this
col1 | col2 | col3 |
---|---|---|
one | 5 | 9 |
I've seen this question where they create an inverted dictionary to solve the problem, and this one about getting the values of a python dictionary. But I'd like to avoid having to create another dictionary, seeing as the values of the dictionary are as unique as the keys.
I get the feeling there should be some easy way of looking up the values as if they were the keys and doing the replacement like that, but I'm not sure.
答案1
得分: 2
如果你的字典是一个双射 并且 COL1 中没有初始值是字典中的一个值,那么唯一的方法是反转字典:
x.replace({'col1': {v: k for k, v in mydict.items()}}, inplace=True)
输出:
col1 col2 col3
0 one 5 9
1 two 6 10
2 three 7 11
3 four 8 12
如果你不具备上述条件,那么你无法以非模糊的方式执行替换。
示例:
x = pd.DataFrame({'col1':['one','two','three','four', 4]})
# col1
# 0 one
# 1 two
# 2 three
# 3 four
# 4 4
mydict = {'one': 1, 'two': 1, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace=True)
x.replace({'col1': {v: k for k, v in mydict.items()}}, inplace=True)
输出:
col1
0 two # 由于非唯一值而错误映射为 "two"
1 two
2 three
3 four
4 four # 由于与映射值的冲突而错误映射为 "four"
英文:
IF your dictionary is a bijection AND there is no initial value in COL1 that is a value from the dictionary, then the only way is to reverse the dictionary:
x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)
Output:
col1 col2 col3
0 one 5 9
1 two 6 10
2 three 7 11
3 four 8 12
If you don't have the above mentioned conditions, then you cannot perform the replacement in a non-ambiguous way.
Example:
x = pd.DataFrame({'col1':['one','two','three','four', 4]})
# col1
# 0 one
# 1 two
# 2 three
# 3 four
# 4 4
mydict = {'one': 1, 'two': 1, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)
x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)
Output:
col1
0 two # incorrectly mapped to "two" due to non-unique values
1 two
2 three
3 four
4 four # incorrectly mapped to "four" due to collision with the mapped value
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论