ValueError: 无法将字符串转换为浮点数: ‘Intel’

huangapple go评论108阅读模式
英文:

ValueError: could not convert string to float: 'Intel'

问题

代码中出现了错误,提示无法将字符串转换为浮点数,可能是因为某些数据需要进行类型转换。您可以尝试在步骤1的ColumnTransformer中为需要进行独热编码的列添加一个转换器,将它们从字符串转换为浮点数。例如,您可以使用LabelEncoder来执行此操作。在你的代码中,你可以添加以下内容:

  1. from sklearn.preprocessing import LabelEncoder
  2. # 选择需要转换的列,这里以第0列为例
  3. label_encoder = LabelEncoder()
  4. X_train[:, 0] = label_encoder.fit_transform(X_train[:, 0])
  5. X_test[:, 0] = label_encoder.transform(X_test[:, 0])

这将有助于将字符串转换为浮点数,从而解决错误。但请确保对所有需要进行转换的列执行类似的操作。

英文:

**I am making laptop price prediction application with ML and there are many examples of this topic. Although I write the codes exactly the same as them, I get such errors and I don't know how to fix them.
these are my codes:
**

  1. step1 = ColumnTransformer(transformers=[
  2. ('col_tnf',OneHotEncoder(sparse=False,drop='first'),[0,1,7,10,11])],remainder='passthrough')
  3. step2 = LinearRegression()
  4. pipe = Pipeline([
  5. ('step1',step1),
  6. ('step2',step2)
  7. ])
  8. pipe.fit(X_train,y_train)
  9. y_pred = pipe.predict(X_test)
  10. print('R2 score',r2_score(y_test,y_pred))
  11. print('MAE',mean_absolute_error(y_test,y_pred))

AND Output:

  1. Output exceeds the size limit. Open the full output data in a text editor---------------------------------------------------------------------------
  2. ValueError Traceback (most recent call last)
  3. Cell In[94], line 12
  4. 5 step2 = LinearRegression()
  5. 7 pipe = Pipeline([
  6. 8 ('step1',step1),
  7. 9 ('step2',step2)
  8. 10 ])
  9. ---> 12 pipe.fit(X_train,y_train)
  10. 14 y_pred = pipe.predict(X_test)
  11. 16 print('R2 score',r2_score(y_test,y_pred))
  12. File c:\Users\...\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params)
  13. 403 if self._final_estimator != "passthrough":
  14. 404 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
  15. --> 405 self._final_estimator.fit(Xt, y, **fit_params_last_step)
  16. 407 return self
  17. File c:\Users\...\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\linear_model\_base.py:648, in LinearRegression.fit(self, X, y, sample_weight)
  18. 644 n_jobs_ = self.n_jobs
  19. 646 accept_sparse = False if self.positive else ["csr", "csc", "coo"]
  20. --> 648 X, y = self._validate_data(
  21. 649 X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True
  22. 650 )
  23. 652 sample_weight = _check_sample_weight(
  24. ...
  25. --> 185 array = numpy.asarray(array, order=order, dtype=dtype)
  26. 186 return xp.asarray(array, copy=copy)
  27. 187 else:
  28. ValueError: could not convert string to float: 'Intel'

I think I need tyour texto do type conversion when I do research on the internet, but frankly, I don't know where to do it.

答案1

得分: 0

代码部分不要翻译:

It's a bit hard to see what exactly is happening without more code but from what I see in your error messages, you define X, Y but where are X_train, X_test, y_train, and y_test instantiated?

  1. from scipy.stats import chi2
  2. from sklearn.compose import ColumnTransformer
  3. from sklearn.feature_selection import SelectPercentile
  4. from sklearn.impute import SimpleImputer
  5. from sklearn.metrics import r2_score, mean_absolute_error
  6. from sklearn.preprocessing import OneHotEncoder, StandardScaler
  7. from sklearn.linear_model import LinearRegression
  8. from sklearn.model_selection import train_test_split
  9. from sklearn.pipeline import Pipeline
  10. import pandas as pd
  11. df = pd.read_csv('laptop_data.csv')
  12. # define your X and Y
  13. # numeric data
  14. numeric_features = ["Screen Size", "RAM"]
  15. numeric_transformer = Pipeline(
  16. steps=[
  17. ("imputer", SimpleImputer(strategy="median")),
  18. ("scaler", StandardScaler())]
  19. )
  20. # categorical data
  21. categorical_features = ["CPU"]
  22. categorical_transformer = Pipeline(
  23. steps=[
  24. ("encoder", OneHotEncoder(handle_unknown="ignore")),
  25. ("selector", SelectPercentile(chi2, percentile=50)),
  26. ]
  27. )
  28. preprocessor = ColumnTransformer(
  29. transformers=[
  30. ("num", numeric_transformer, numeric_features),
  31. ("cat", categorical_transformer, categorical_features),
  32. ]
  33. )
  34. pipe = Pipeline([
  35. ("preprocessor", preprocessor),
  36. ("classifier", LinearRegression())
  37. ])
  38. X_train, X_test, y_train, y_test = \
  39. train_test_split(X, Y, test_size=0.2, random_state=30, stratify=Y)
  40. pipe.fit(X_train, y_train)
  41. y_pred = pipe.predict(X_test)
  42. print('R2 score', r2_score(y_test, y_pred))
  43. print('MAE', mean_absolute_error(y_test, y_pred))

Edit: I don't know what your data looks like but you'll probably want to create preprocessing pipelines for both numeric and categorical data. I just used CPU as a categorical assuming that was the column containing 'Intel' and 'AMD'.

英文:

It's a bit hard to see what exactly is happening without more code but from what I see in your error messages, you define X, Y but where are X_train, X_test, y_train, and y_test instantiated?

  1. from scipy.stats import chi2
  2. from sklearn.compose import ColumnTransformer
  3. from sklearn.feature_selection import SelectPercentile
  4. from sklearn.impute import SimpleImputer
  5. from sklearn.metrics import r2_score, mean_absolute_error
  6. from sklearn.preprocessing import OneHotEncoder, StandardScaler
  7. from sklearn.linear_model import LinearRegression
  8. from sklearn.model_selection import train_test_split
  9. from sklearn.pipeline import Pipeline
  10. import pandas as pd
  11. df = pd.read_csv('laptop_data.csv')
  12. # define your X and Y
  13. # numeric data
  14. numeric_features = ["Screen Size", "RAM"]
  15. numeric_transformer = Pipeline(
  16. steps=[
  17. ("imputer", SimpleImputer(strategy="median")),
  18. ("scaler", StandardScaler())]
  19. )
  20. # categorical data
  21. categorical_features = ["CPU"]
  22. categorical_transformer = Pipeline(
  23. steps=[
  24. ("encoder", OneHotEncoder(handle_unknown="ignore")),
  25. ("selector", SelectPercentile(chi2, percentile=50)),
  26. ]
  27. )
  28. preprocessor = ColumnTransformer(
  29. transformers=[
  30. ("num", numeric_transformer, numeric_features),
  31. ("cat", categorical_transformer, categorical_features),
  32. ]
  33. )
  34. pipe = Pipeline([
  35. ("preprocessor", preprocessor),
  36. ("classifier", LinearRegression())
  37. ])
  38. X_train, X_test, y_train, y_test = \
  39. train_test_split(X, Y, test_size=0.2, random_state=30, stratify=Y)
  40. pipe.fit(X_train, y_train)
  41. y_pred = pipe.predict(X_test)
  42. print('R2 score', r2_score(y_test, y_pred))
  43. print('MAE', mean_absolute_error(y_test, y_pred))

Edit: I don't know what your data looks like but you'll probably want to create preprocessing pipelines for both numeric and categorical data. I just used CPU as a categorical assuming that was the column containing 'Intel' and 'AMD'

huangapple
  • 本文由 发表于 2023年4月11日 05:28:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75980884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定