英文:
ValueError: invalid literal for int() with base 10: ' ' in Python
问题
当我编写我的代码时,我会遇到"ValueError: invalid literal for int() with base 10: ' '". 基本上,我猜这是与类型转换有关的问题,但我不知道如何在这里进行编辑。你可以帮助我吗?
英文:
When I write my codes like this, I get ValueError: invalid literal for int() with base 10: ' '. Basically I guess it's the problem with the type conversion but I don't know how to edit it here. Can you help me please ? This is my codes:
#preprocessing
df['Memory'] = df['Memory'].astype(str).replace('.0', '', regex=True)
df["Memory"] = df["Memory"].str.replace('GB', '')
df["Memory"] = df["Memory"].str.replace('TB', '000')
new = df["Memory"].str.split("+", n = 1, expand = True)
df["first"]= new[0]
df["first"]=df["first"].str.strip()
df["second"]= new[1]
df["Layer1HDD"] = df["first"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer1SSD"] = df["first"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer1Hybrid"] = df["first"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer1Flash_Storage"] = df["first"].apply(lambda x: 1 if "Flash Storage" in x else 0)
df['first'] = df['first'].str.replace(r'D', '')
df["second"].fillna("0", inplace = True)
df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)
df['second'] = df['second'].str.replace(r'D', '')
#binary encoding
df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)
#only keep integert(digits)
df['second'] = df['second'].str.replace(r'D','')#convert to numeric
df['second'] = df['second'].astype(int)
df['first'] = df['first'].astype(int)
df['second'] = df['second'].astype(int)
#finalize the columns by keeping value
df["HDD"]=(df["first"]*df["Layer1HDD"]+df["second"]*df["Layer2HDD"])
df["SSD"]=(df["first"]*df["Layer1SSD"]+df["second"]*df["Layer2SSD"])
df["Hybrid"]=(df["first"]*df["Layer1Hybrid"]+df["second"]*df["Layer2Hybrid"])
df["Flash_Storage"]=(df["first"]*df["Layer1Flash_Storage"]+df["second"]*df["Layer2Flash_Storage"])
#Drop the un required columns
df.drop(columns=['first', 'second', 'Layer1HDD', 'Layer1SSD', 'Layer1Hybrid',
'Layer1Flash_Storage', 'Layer2HDD', 'Layer2SSD', 'Layer2Hybrid',
'Layer2Flash_Storage'],inplace=True)
I get the error in the title in this code and unfortunately my knowledge of python is limited. I don't know how to solve it. Can you help me ?
My dataset is here
答案1
得分: 0
以下是翻译好的部分:
你会收到这个错误 ValueError: invalid literal for int() with base 10
是因为你试图将一个包含非数字值的系列转换为整数 (df['second'].astype(int)
)。
在这一行 df['second'] = df['second'].str.replace(r'D','')
中,你的正则表达式是错误的。要删除非数字字符,你应该使用以下方式:
df['second'] = df['second'].str.replace(r'\D+', '')
同样,对于系列 df['first']
也执行相同操作:
df['first'] = df['first'].str.replace(r'\D+', '')
英文:
You get this error ValueError: invalid literal for int() with base 10
because you are trying to convert a series to int (df['second'].astype(int)
) that has non-numeric values.
In the line df['second'] = df['second'].str.replace(r'D','')
your regex is wrong. To remove non-numeric characters you should use
df['second'] = df['second'].str.replace(r'\D+', '')
Also do this for the series df['first']
df['first'] = df['first'].str.replace(r'\D+', '')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论