2023年6月16日 14:34:20go评论93阅读模式

英文:

How to view specific rows that my logistic regression has classified

问题

我目前正在开发一个逻辑回归模型，用于预测特定交易的结果。该模型将测试集中的交易分类为好/坏（1/0）。我想查看每个组中被分类的交易，并将被分类为“好”的交易乘以其利润/损失，以确定逻辑回归模型是否真的盈利。有没有办法我可以查看模型分类为True/False的每个条目的特定行信息？

这是我的数据缩放和拆分为训练/测试集的代码：

x = df[x_train_features]
y = df["y"]
y = y.astype(int)
# 缩放数据
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)
# 将训练数据拆分为测试集和训练集
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
# 实例化模型（使用默认参数）
logreg = LogisticRegression()
# 用数据拟合模型
logreg.fit(X_train, y_train)
y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)

我尝试使用df.loc[y_pred_test == True]，但出现错误：

布尔索引长度错误：720而不是2880

这很可能是因为测试集比整个样本集要小。

英文:

I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?

This is what my code looks like for my data scaling and splitting into train/test set:

x = df[x_train_features]
y = df[&quot;y&quot;]
y = y.astype(&quot;int&quot;)
# scale data
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)
# split training data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)
# instantiate the model (using the default parameters)
logreg = LogisticRegression()
# fit the model with data
logreg.fit(X_train, y_train)
y_pred_test = (logreg.predict_proba(X_test)[:, 1] &gt;= 0.5).astype(bool)

I tried to use df.loc[y_pred_test == True], but I get the error:

Boolean index has wrong length: 720 instead of 2880

most likely because the test set is smaller than the whole sample set.

答案1

得分: 0

错误是因为您尚未将预测值与数据框 (df) 连接起来。
您可以尝试以下操作：

y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis=1)

这将把您的预测值与真实值合并在一起。
然后您可以尝试以下操作：

X_test.iloc[y_pred_test == True]

由于您尚未对整个数据集 (df) 进行预测，这就是为什么出现错误，y_pred_test 中的行数为 720 而不是 2880。

英文:

The error is because you haven't concatenated your prediction values with the df.
You might try this:

y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis =1)

This will combine your prediction values with the ground truth.
Then you can try the following:

X_test.iloc[y_pred_test == True]

And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何查看我的逻辑回归已分类的特定行

问题

答案1

Python列表中的数据框之间的分段线性插值

基于另一个数据框架的条件筛选多级索引数据框。

如何在pandas数据框的列中添加标签，带有else条件？

根据页面范围创建新的因子列

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。