英文:
Why does pandas fail to join on two columns of object dtype (one of them is converted from int to object)?
问题
以下是您要翻译的部分:
"为什么 pandas 在这个合并操作中失败?我可以将列转换为字符串 'dtype',然后再转回 'object',就可以成功执行:"
"现在合并是正常的,并找到了匹配项。我应该如何理解这种行为?"
英文:
The following merge strategy fails:
import pandas as pd
data1 = {'c1': ['J', 'A', 'B'],
'key': [25, 30, 35]}
df1 = pd.DataFrame(data1)
data2 = {'c2': ['A', 'B', 'C'],
'key': ["25","30","36"]}
df2 = pd.DataFrame(data2, dtype="O")
df1.key = df1.key.astype("O")
print(df1.merge(df2, on = "key"))
output:
Empty DataFrame
Columns: [c1, key, c2]
Index: []
Why is pandas failing in this merge? I can convert the column to string dtype
as follows and then back to object
and it works:
df1.key = df1.key.astype(str).astype("O")
Now the merge is okay and finds the matches. How should I understand this behavior?
答案1
得分: 2
将Series转换为对象并不会使项目变成字符串,它只是将Series的dtype设置为对象(一个对象Series可以包含任何内容:整数、浮点数、字符串、列表、类等):
df1['key'] = df1['key'].astype('O')
print(df1['key'].tolist())
# [25, 30, 35]
print(type(df1['key'].iloc[0]))
# <class 'int'>
重要的是将项目转换为相同的类型,例如字符串:
df1['key'] = df1['key'].astype(str)
print(df1['key'].tolist())
# ['25', '30', '35']
英文:
Converting the Series to object doesn't make the items strings, it just makes the Series dtype object (An object Series can contain anything: integers, floats, strings, lists, classes…):
df1['key'] = df1['key'].astype('O')
print(df1['key'].tolist())
# [25, 30, 35]
print(type(df1['key'].iloc[0]))
# <class 'int'>
What is important is to convert the items to an identical type, for example strings:
df1['key'] = df1['key'].astype(str)
print(df1['key'].tolist())
# ['25', '30', '35']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论