如何查看我的逻辑回归已分类的特定行

huangapple go评论68阅读模式
英文:

How to view specific rows that my logistic regression has classified

问题

我目前正在开发一个逻辑回归模型,用于预测特定交易的结果。该模型将测试集中的交易分类为好/坏(1/0)。我想查看每个组中被分类的交易,并将被分类为“好”的交易乘以其利润/损失,以确定逻辑回归模型是否真的盈利。有没有办法我可以查看模型分类为True/False的每个条目的特定行信息?

这是我的数据缩放和拆分为训练/测试集的代码:

x = df[x_train_features]
y = df["y"]
y = y.astype(int)

# 缩放数据
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)

# 将训练数据拆分为测试集和训练集
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)

# 实例化模型(使用默认参数)
logreg = LogisticRegression()

# 用数据拟合模型
logreg.fit(X_train, y_train)

y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)

我尝试使用df.loc[y_pred_test == True],但出现错误:

布尔索引长度错误:720而不是2880

这很可能是因为测试集比整个样本集要小。

英文:

I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?

This is what my code looks like for my data scaling and splitting into train/test set:

x = df[x_train_features]
y = df["y"]
y = y.astype("int")


# scale data
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)


# split training data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)

# instantiate the model (using the default parameters)
logreg = LogisticRegression()


# fit the model with data
logreg.fit(X_train, y_train)


y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)

I tried to use df.loc[y_pred_test == True], but I get the error:

Boolean index has wrong length: 720 instead of 2880

most likely because the test set is smaller than the whole sample set.

答案1

得分: 0

错误是因为您尚未将预测值与数据框 (df) 连接起来。
您可以尝试以下操作:

y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis=1)

这将把您的预测值与真实值合并在一起。
然后您可以尝试以下操作:

X_test.iloc[y_pred_test == True]

由于您尚未对整个数据集 (df) 进行预测,这就是为什么出现错误,y_pred_test 中的行数为 720 而不是 2880。

英文:

The error is because you haven't concatenated your prediction values with the df.
You might try this:

y_pred_test = pd.DataFrame(y_pred_test)
X_test = pd.concat([y_test, y_pred_test], axis =1) 

This will combine your prediction values with the ground truth.
Then you can try the following:

X_test.iloc[y_pred_test == True]

And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.

huangapple
  • 本文由 发表于 2023年6月16日 14:34:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487495.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定