2023年4月11日 05:28:11go评论108阅读模式

英文:

ValueError: could not convert string to float: 'Intel'

问题

代码中出现了错误，提示无法将字符串转换为浮点数，可能是因为某些数据需要进行类型转换。您可以尝试在步骤1的ColumnTransformer中为需要进行独热编码的列添加一个转换器，将它们从字符串转换为浮点数。例如，您可以使用LabelEncoder来执行此操作。在你的代码中，你可以添加以下内容：

from sklearn.preprocessing import LabelEncoder
# 选择需要转换的列，这里以第0列为例
label_encoder = LabelEncoder()
X_train[:, 0] = label_encoder.fit_transform(X_train[:, 0])
X_test[:, 0] = label_encoder.transform(X_test[:, 0])

这将有助于将字符串转换为浮点数，从而解决错误。但请确保对所有需要进行转换的列执行类似的操作。

英文:

**I am making laptop price prediction application with ML and there are many examples of this topic. Although I write the codes exactly the same as them, I get such errors and I don't know how to fix them.
these are my codes:
**

step1 = ColumnTransformer(transformers=[
    (&#39;col_tnf&#39;,OneHotEncoder(sparse=False,drop=&#39;first&#39;),[0,1,7,10,11])],remainder=&#39;passthrough&#39;)
step2 = LinearRegression()
pipe = Pipeline([
    (&#39;step1&#39;,step1),
    (&#39;step2&#39;,step2)
])
pipe.fit(X_train,y_train)
y_pred = pipe.predict(X_test)
print(&#39;R2 score&#39;,r2_score(y_test,y_pred))
print(&#39;MAE&#39;,mean_absolute_error(y_test,y_pred))

AND Output:

Output exceeds the size limit. Open the full output data in a text editor---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[94], line 12
      5 step2 = LinearRegression()
      7 pipe = Pipeline([
      8     (&#39;step1&#39;,step1),
      9     (&#39;step2&#39;,step2)
     10 ])
---&gt; 12 pipe.fit(X_train,y_train)
     14 y_pred = pipe.predict(X_test)
     16 print(&#39;R2 score&#39;,r2_score(y_test,y_pred))
File c:\Users\...\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params)
    403     if self._final_estimator != &quot;passthrough&quot;:
    404         fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--&gt; 405         self._final_estimator.fit(Xt, y, **fit_params_last_step)
    407 return self
File c:\Users\...\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\linear_model\_base.py:648, in LinearRegression.fit(self, X, y, sample_weight)
    644 n_jobs_ = self.n_jobs
    646 accept_sparse = False if self.positive else [&quot;csr&quot;, &quot;csc&quot;, &quot;coo&quot;]
--&gt; 648 X, y = self._validate_data(
    649     X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True
    650 )
    652 sample_weight = _check_sample_weight(
...
--&gt; 185     array = numpy.asarray(array, order=order, dtype=dtype)
    186     return xp.asarray(array, copy=copy)
    187 else:
ValueError: could not convert string to float: &#39;Intel&#39;

I think I need tyour texto do type conversion when I do research on the internet, but frankly, I don't know where to do it.

答案1

得分: 0

代码部分不要翻译:

It's a bit hard to see what exactly is happening without more code but from what I see in your error messages, you define X, Y but where are X_train, X_test, y_train, and y_test instantiated?

from scipy.stats import chi2
from sklearn.compose import ColumnTransformer
from sklearn.feature_selection import SelectPercentile
from sklearn.impute import SimpleImputer
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
import pandas as pd
df = pd.read_csv('laptop_data.csv')
# define your X and Y
# numeric data
numeric_features = ["Screen Size", "RAM"]
numeric_transformer = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="median")),
        ("scaler", StandardScaler())]
)
# categorical data
categorical_features = ["CPU"]
categorical_transformer = Pipeline(
    steps=[
        ("encoder", OneHotEncoder(handle_unknown="ignore")),
        ("selector", SelectPercentile(chi2, percentile=50)),
    ]
)
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features),
    ]
)
pipe = Pipeline([
    ("preprocessor", preprocessor),
    ("classifier", LinearRegression())
])
X_train, X_test, y_train, y_test = \
    train_test_split(X, Y, test_size=0.2, random_state=30, stratify=Y)
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print('R2 score', r2_score(y_test, y_pred))
print('MAE', mean_absolute_error(y_test, y_pred))

Edit: I don't know what your data looks like but you'll probably want to create preprocessing pipelines for both numeric and categorical data. I just used CPU as a categorical assuming that was the column containing 'Intel' and 'AMD'.

英文:

It's a bit hard to see what exactly is happening without more code but from what I see in your error messages, you define X, Y but where are X_train, X_test, y_train, and y_test instantiated?

from scipy.stats import chi2
from sklearn.compose import ColumnTransformer
from sklearn.feature_selection import SelectPercentile
from sklearn.impute import SimpleImputer
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
import pandas as pd
df = pd.read_csv(&#39;laptop_data.csv&#39;)
# define your X and Y
# numeric data
numeric_features = [&quot;Screen Size&quot;, &quot;RAM&quot;]
numeric_transformer = Pipeline(
    steps=[
        (&quot;imputer&quot;, SimpleImputer(strategy=&quot;median&quot;)),
        (&quot;scaler&quot;, StandardScaler())]
)
# categorical data
categorical_features = [&quot;CPU&quot;]
categorical_transformer = Pipeline(
    steps=[
        (&quot;encoder&quot;, OneHotEncoder(handle_unknown=&quot;ignore&quot;)),
        (&quot;selector&quot;, SelectPercentile(chi2, percentile=50)),
    ]
)
preprocessor = ColumnTransformer(
    transformers=[
        (&quot;num&quot;, numeric_transformer, numeric_features),
        (&quot;cat&quot;, categorical_transformer, categorical_features),
    ]
)
pipe = Pipeline([
    (&quot;preprocessor&quot;, preprocessor),
    (&quot;classifier&quot;, LinearRegression())
])
X_train, X_test, y_train, y_test = \
    train_test_split(X, Y, test_size=0.2, random_state=30, stratify=Y)
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print(&#39;R2 score&#39;, r2_score(y_test, y_pred))
print(&#39;MAE&#39;, mean_absolute_error(y_test, y_pred))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

ValueError: 无法将字符串转换为浮点数: ‘Intel’

问题

答案1

为什么不能在Python中转换字节？

如何在Python代码中永久存储.txt数据？

我正在使用VPython构建这个双弹跳球模拟，并遇到了奇怪的橡皮筋问题。

广播对于NumPy数组 – 矢量化二次形式

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。