如何查看我的逻辑回归已分类的特定行

huangapple go评论93阅读模式
英文:

How to view specific rows that my logistic regression has classified

问题

我目前正在开发一个逻辑回归模型,用于预测特定交易的结果。该模型将测试集中的交易分类为好/坏(1/0)。我想查看每个组中被分类的交易,并将被分类为“好”的交易乘以其利润/损失,以确定逻辑回归模型是否真的盈利。有没有办法我可以查看模型分类为True/False的每个条目的特定行信息?

这是我的数据缩放和拆分为训练/测试集的代码:

  1. x = df[x_train_features]
  2. y = df["y"]
  3. y = y.astype(int)
  4. # 缩放数据
  5. scaler = MinMaxScaler()
  6. scaledx = scaler.fit_transform(x)
  7. # 将训练数据拆分为测试集和训练集
  8. X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
  9. # 实例化模型(使用默认参数)
  10. logreg = LogisticRegression()
  11. # 用数据拟合模型
  12. logreg.fit(X_train, y_train)
  13. y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)

我尝试使用df.loc[y_pred_test == True],但出现错误:

  1. 布尔索引长度错误:720而不是2880

这很可能是因为测试集比整个样本集要小。

英文:

I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?

This is what my code looks like for my data scaling and splitting into train/test set:

  1. x = df[x_train_features]
  2. y = df["y"]
  3. y = y.astype("int")
  4. # scale data
  5. scaler = MinMaxScaler()
  6. scaledx = scaler.fit_transform(x)
  7. # split training data into test and training sets
  8. X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
  9. # instantiate the model (using the default parameters)
  10. logreg = LogisticRegression()
  11. # fit the model with data
  12. logreg.fit(X_train, y_train)
  13. y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)

I tried to use df.loc[y_pred_test == True], but I get the error:

  1. Boolean index has wrong length: 720 instead of 2880

most likely because the test set is smaller than the whole sample set.

答案1

得分: 0

错误是因为您尚未将预测值与数据框 (df) 连接起来。
您可以尝试以下操作:

  1. y_pred_test = pd.DataFrame(y_pred_test)
  2. X_test = pd.concat([y_test, y_pred_test], axis=1)

这将把您的预测值与真实值合并在一起。
然后您可以尝试以下操作:

  1. X_test.iloc[y_pred_test == True]

由于您尚未对整个数据集 (df) 进行预测,这就是为什么出现错误,y_pred_test 中的行数为 720 而不是 2880。

英文:

The error is because you haven't concatenated your prediction values with the df.
You might try this:

  1. y_pred_test = pd.DataFrame(y_pred_test)
  2. X_test = pd.concat([y_test, y_pred_test], axis =1)

This will combine your prediction values with the ground truth.
Then you can try the following:

  1. X_test.iloc[y_pred_test == True]

And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.

huangapple
  • 本文由 发表于 2023年6月16日 14:34:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487495.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定