撤销在pandas数据框中使用字典进行替换。

huangapple go评论60阅读模式
英文:

Undoing replacement with a dictionary in pandas dataframe

问题

Sure, here's the translated code part:

我有一个类似这样的pandas数据框:

x = pd.DataFrame({'col1':['one','two','three','four'],'col2':[5,6,7,8],'col3':[9,10,11,12]})

为了我的目的(训练一个机器学习模型),我需要用数字替换文本,所以我使用pd.replace()和一个字典来做这个替换:

mydict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)

之后,我训练模型并让它返回一个候选项,但是由于模型只看到了数字,它返回的候选项是在第一列中的数字,类似于这样:

col1 col2 col3
1 5 9

我希望得到类似于这样的结果:

col1 col2 col3
one 5 9

我看到了这个问题,他们创建了一个反向字典来解决这个问题,还有这个问题关于如何获取python字典的值。但我想避免创建另一个字典,因为字典的值与键一样唯一。我有一种感觉,应该有一种简单的方法,可以像查找键一样查找值并进行替换,但我不确定。

英文:

I have a pandas dataframe like so:

x = pd.DataFrame({'col1':['one','two','three','four'],'col2':[5,6,7,8],'col3':[9,10,11,12]})

For my purposes (training a ml model, I need to replace the text with numbers, so I use pd.replace() with a dictionary to change that

mydict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)

After that, I train the model and have it return a proposed candidate, but the model, having seen only the numbers, returns the candidate as numbers in that first column, something like this

col1 col2 col3
1 5 9

Where I'd like to get something like this

col1 col2 col3
one 5 9

I've seen this question where they create an inverted dictionary to solve the problem, and this one about getting the values of a python dictionary. But I'd like to avoid having to create another dictionary, seeing as the values of the dictionary are as unique as the keys.

I get the feeling there should be some easy way of looking up the values as if they were the keys and doing the replacement like that, but I'm not sure.

答案1

得分: 2

如果你的字典是一个双射 并且 COL1 中没有初始值是字典中的一个值,那么唯一的方法是反转字典

x.replace({'col1': {v: k for k, v in mydict.items()}}, inplace=True)

输出:

    col1  col2  col3
0    one     5     9
1    two     6    10
2  three     7    11
3   four     8    12

如果你不具备上述条件,那么你无法以非模糊的方式执行替换。

示例:

x = pd.DataFrame({'col1':['one','two','three','four', 4]})
#     col1
# 0    one
# 1    two
# 2  three
# 3   four
# 4      4

mydict = {'one': 1, 'two': 1, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace=True)
x.replace({'col1': {v: k for k, v in mydict.items()}}, inplace=True)

输出:

    col1
0    two # 由于非唯一值而错误映射为 "two"
1    two
2  three
3   four
4   four # 由于与映射值的冲突而错误映射为 "four"
英文:

IF your dictionary is a bijection AND there is no initial value in COL1 that is a value from the dictionary, then the only way is to reverse the dictionary:

x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)

Output:

    col1  col2  col3
0    one     5     9
1    two     6    10
2  three     7    11
3   four     8    12

If you don't have the above mentioned conditions, then you cannot perform the replacement in a non-ambiguous way.

Example:

x = pd.DataFrame({'col1':['one','two','three','four', 4]})
#     col1
# 0    one
# 1    two
# 2  three
# 3   four
# 4      4

mydict = {'one': 1, 'two': 1, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)
x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)

Output:

    col1
0    two # incorrectly mapped to "two" due to non-unique values
1    two
2  three
3   four
4   four # incorrectly mapped to "four" due to collision with the mapped value

huangapple
  • 本文由 发表于 2023年6月26日 16:38:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554958.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定