英文:
How to view specific rows that my logistic regression has classified
问题
我目前正在开发一个逻辑回归模型,用于预测特定交易的结果。该模型将测试集中的交易分类为好/坏(1/0)。我想查看每个组中被分类的交易,并将被分类为“好”的交易乘以其利润/损失,以确定逻辑回归模型是否真的盈利。有没有办法我可以查看模型分类为True/False的每个条目的特定行信息?
这是我的数据缩放和拆分为训练/测试集的代码:
x = df[x_train_features]
y = df["y"]
y = y.astype(int)
# 缩放数据
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)
# 将训练数据拆分为测试集和训练集
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
# 实例化模型(使用默认参数)
logreg = LogisticRegression()
# 用数据拟合模型
logreg.fit(X_train, y_train)
y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)
我尝试使用df.loc[y_pred_test == True]
,但出现错误:
布尔索引长度错误:720而不是2880
这很可能是因为测试集比整个样本集要小。
英文:
I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?
This is what my code looks like for my data scaling and splitting into train/test set:
x = df[x_train_features]
y = df["y"]
y = y.astype("int")
# scale data
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)
# split training data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
# instantiate the model (using the default parameters)
logreg = LogisticRegression()
# fit the model with data
logreg.fit(X_train, y_train)
y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)
I tried to use df.loc[y_pred_test == True]
, but I get the error:
Boolean index has wrong length: 720 instead of 2880
most likely because the test set is smaller than the whole sample set.
答案1
得分: 0
错误是因为您尚未将预测值与数据框 (df) 连接起来。
您可以尝试以下操作:
y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis=1)
这将把您的预测值与真实值合并在一起。
然后您可以尝试以下操作:
X_test.iloc[y_pred_test == True]
由于您尚未对整个数据集 (df) 进行预测,这就是为什么出现错误,y_pred_test 中的行数为 720 而不是 2880。
英文:
The error is because you haven't concatenated your prediction values with the df.
You might try this:
y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis =1)
This will combine your prediction values with the ground truth.
Then you can try the following:
X_test.iloc[y_pred_test == True]
And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论