ValueError: 在 Python 中使用 base 10 时 int() 的文字无效: ‘' '’

huangapple go评论243阅读模式
英文:

ValueError: invalid literal for int() with base 10: ' ' in Python

问题

当我编写我的代码时,我会遇到"ValueError: invalid literal for int() with base 10: ' '". 基本上,我猜这是与类型转换有关的问题,但我不知道如何在这里进行编辑。你可以帮助我吗?

我的数据集在这里

英文:

When I write my codes like this, I get ValueError: invalid literal for int() with base 10: ' '. Basically I guess it's the problem with the type conversion but I don't know how to edit it here. Can you help me please ? This is my codes:

#preprocessing
df['Memory'] = df['Memory'].astype(str).replace('.0', '', regex=True)
df["Memory"] = df["Memory"].str.replace('GB', '')
df["Memory"] = df["Memory"].str.replace('TB', '000')
new = df["Memory"].str.split("+", n = 1, expand = True)
df["first"]= new[0]
df["first"]=df["first"].str.strip()
df["second"]= new[1]
df["Layer1HDD"] = df["first"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer1SSD"] = df["first"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer1Hybrid"] = df["first"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer1Flash_Storage"] = df["first"].apply(lambda x: 1 if "Flash Storage" in x else 0)
df['first'] = df['first'].str.replace(r'D', '')
df["second"].fillna("0", inplace = True)
df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)
df['second'] = df['second'].str.replace(r'D', '')
#binary encoding
df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)
#only keep integert(digits)
df['second'] = df['second'].str.replace(r'D','')#convert to numeric




df['second'] = df['second'].astype(int)
df['first'] = df['first'].astype(int)
df['second'] = df['second'].astype(int)



#finalize the columns by keeping value
df["HDD"]=(df["first"]*df["Layer1HDD"]+df["second"]*df["Layer2HDD"])
df["SSD"]=(df["first"]*df["Layer1SSD"]+df["second"]*df["Layer2SSD"])
df["Hybrid"]=(df["first"]*df["Layer1Hybrid"]+df["second"]*df["Layer2Hybrid"])
df["Flash_Storage"]=(df["first"]*df["Layer1Flash_Storage"]+df["second"]*df["Layer2Flash_Storage"])
#Drop the un required columns
df.drop(columns=['first', 'second', 'Layer1HDD', 'Layer1SSD', 'Layer1Hybrid',
       'Layer1Flash_Storage', 'Layer2HDD', 'Layer2SSD', 'Layer2Hybrid',
       'Layer2Flash_Storage'],inplace=True)

I get the error in the title in this code and unfortunately my knowledge of python is limited. I don't know how to solve it. Can you help me ?
My dataset is here

答案1

得分: 0

以下是翻译好的部分:

你会收到这个错误 ValueError: invalid literal for int() with base 10 是因为你试图将一个包含非数字值的系列转换为整数 (df['second'].astype(int))。

在这一行 df['second'] = df['second'].str.replace(r'D','') 中,你的正则表达式是错误的。要删除非数字字符,你应该使用以下方式:

df['second'] = df['second'].str.replace(r'\D+', '')

同样,对于系列 df['first'] 也执行相同操作:

df['first'] = df['first'].str.replace(r'\D+', '')
英文:

You get this error ValueError: invalid literal for int() with base 10 because you are trying to convert a series to int (df['second'].astype(int)) that has non-numeric values.

In the line df['second'] = df['second'].str.replace(r'D','') your regex is wrong. To remove non-numeric characters you should use

df['second'] = df['second'].str.replace(r'\D+', '')

Also do this for the series df['first']

df['first'] = df['first'].str.replace(r'\D+', '')

huangapple
  • 本文由 发表于 2023年4月13日 23:07:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76007033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定