问题

当我编写我的代码时，我会遇到"ValueError: invalid literal for int() with base 10: ' '". 基本上，我猜这是与类型转换有关的问题，但我不知道如何在这里进行编辑。你可以帮助我吗？

英文:

When I write my codes like this, I get ValueError: invalid literal for int() with base 10: ' '. Basically I guess it's the problem with the type conversion but I don't know how to edit it here. Can you help me please ? This is my codes:

#preprocessing
df[&#39;Memory&#39;] = df[&#39;Memory&#39;].astype(str).replace(&#39;.0&#39;, &#39;&#39;, regex=True)
df[&quot;Memory&quot;] = df[&quot;Memory&quot;].str.replace(&#39;GB&#39;, &#39;&#39;)
df[&quot;Memory&quot;] = df[&quot;Memory&quot;].str.replace(&#39;TB&#39;, &#39;000&#39;)
new = df[&quot;Memory&quot;].str.split(&quot;+&quot;, n = 1, expand = True)
df[&quot;first&quot;]= new[0]
df[&quot;first&quot;]=df[&quot;first&quot;].str.strip()
df[&quot;second&quot;]= new[1]
df[&quot;Layer1HDD&quot;] = df[&quot;first&quot;].apply(lambda x: 1 if &quot;HDD&quot; in x else 0)
df[&quot;Layer1SSD&quot;] = df[&quot;first&quot;].apply(lambda x: 1 if &quot;SSD&quot; in x else 0)
df[&quot;Layer1Hybrid&quot;] = df[&quot;first&quot;].apply(lambda x: 1 if &quot;Hybrid&quot; in x else 0)
df[&quot;Layer1Flash_Storage&quot;] = df[&quot;first&quot;].apply(lambda x: 1 if &quot;Flash Storage&quot; in x else 0)
df[&#39;first&#39;] = df[&#39;first&#39;].str.replace(r&#39;D&#39;, &#39;&#39;)
df[&quot;second&quot;].fillna(&quot;0&quot;, inplace = True)
df[&quot;Layer2HDD&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;HDD&quot; in x else 0)
df[&quot;Layer2SSD&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;SSD&quot; in x else 0)
df[&quot;Layer2Hybrid&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;Hybrid&quot; in x else 0)
df[&quot;Layer2Flash_Storage&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;Flash Storage&quot; in x else 0)
df[&#39;second&#39;] = df[&#39;second&#39;].str.replace(r&#39;D&#39;, &#39;&#39;)
#binary encoding
df[&quot;Layer2HDD&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;HDD&quot; in x else 0)
df[&quot;Layer2SSD&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;SSD&quot; in x else 0)
df[&quot;Layer2Hybrid&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;Hybrid&quot; in x else 0)
df[&quot;Layer2Flash_Storage&quot;] = df[&quot;second&quot;].apply(lambda x: 1 if &quot;Flash Storage&quot; in x else 0)
#only keep integert(digits)
df[&#39;second&#39;] = df[&#39;second&#39;].str.replace(r&#39;D&#39;,&#39;&#39;)#convert to numeric




df[&#39;second&#39;] = df[&#39;second&#39;].astype(int)
df[&#39;first&#39;] = df[&#39;first&#39;].astype(int)
df[&#39;second&#39;] = df[&#39;second&#39;].astype(int)



#finalize the columns by keeping value
df[&quot;HDD&quot;]=(df[&quot;first&quot;]*df[&quot;Layer1HDD&quot;]+df[&quot;second&quot;]*df[&quot;Layer2HDD&quot;])
df[&quot;SSD&quot;]=(df[&quot;first&quot;]*df[&quot;Layer1SSD&quot;]+df[&quot;second&quot;]*df[&quot;Layer2SSD&quot;])
df[&quot;Hybrid&quot;]=(df[&quot;first&quot;]*df[&quot;Layer1Hybrid&quot;]+df[&quot;second&quot;]*df[&quot;Layer2Hybrid&quot;])
df[&quot;Flash_Storage&quot;]=(df[&quot;first&quot;]*df[&quot;Layer1Flash_Storage&quot;]+df[&quot;second&quot;]*df[&quot;Layer2Flash_Storage&quot;])
#Drop the un required columns
df.drop(columns=[&#39;first&#39;, &#39;second&#39;, &#39;Layer1HDD&#39;, &#39;Layer1SSD&#39;, &#39;Layer1Hybrid&#39;,
       &#39;Layer1Flash_Storage&#39;, &#39;Layer2HDD&#39;, &#39;Layer2SSD&#39;, &#39;Layer2Hybrid&#39;,
       &#39;Layer2Flash_Storage&#39;],inplace=True)

I get the error in the title in this code and unfortunately my knowledge of python is limited. I don't know how to solve it. Can you help me ?
My dataset is here

答案1

得分: 0

以下是翻译好的部分：

你会收到这个错误 ValueError: invalid literal for int() with base 10 是因为你试图将一个包含非数字值的系列转换为整数 (df['second'].astype(int))。

在这一行 df['second'] = df['second'].str.replace(r'D','') 中，你的正则表达式是错误的。要删除非数字字符，你应该使用以下方式：

df['second'] = df['second'].str.replace(r'\D+', '')

同样，对于系列 df['first'] 也执行相同操作：

df['first'] = df['first'].str.replace(r'\D+', '')

英文:

You get this error ValueError: invalid literal for int() with base 10 because you are trying to convert a series to int (df['second'].astype(int)) that has non-numeric values.

In the line df['second'] = df['second'].str.replace(r'D','') your regex is wrong. To remove non-numeric characters you should use

df[&#39;second&#39;] = df[&#39;second&#39;].str.replace(r&#39;\D+&#39;, &#39;&#39;)

Also do this for the series df['first']

df[&#39;first&#39;] = df[&#39;first&#39;].str.replace(r&#39;\D+&#39;, &#39;&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

ValueError: 在 Python 中使用 base 10 时 int() 的文字无效: ‘' '’

问题

答案1

如何使用Scrapy Playwright设置页面的视口大小？

在Python中旋转图像，并获取在旋转后的图像中的像素的原始位置。

TkInter越来越多地占用内存

Python matplotlib堆叠条形图（系列、数据和类别）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论