我的数据中用于训练股票价格预测模型的目标是什么?

huangapple go评论59阅读模式
英文:

What's the target in my data for training a stock price predictor?

问题

我想为委内瑞拉的经济指标建立一个股票价格预测器,我已经清洗和结构化了我想要使用的历史数据(过去10年的数据),但是由于这是我的第一个机器学习项目,我有一些疑问。我的CSV数据包含3000多个条目,如下所示:

    2553
    11-28-2017;0.8823561
    2554
    11-29-2017;0.9679446
    2555
    11-30-2017;0.9719271
    2556
    12-1-2017;1.0302427

如您所见,第0列是日期,第1列是该日期的价格。在这种情况下,训练数据(X)应该是价格。然而,我想要使用的方法期望同时有X和Y(监督学习),因为这是我第一次获取自己的数据,我感到有些迷茫。这是我目前的代码:链接。在解决了关于数据的疑问后,我想将来用LSTM来训练我的模型,或者也许我会从一个简单的人工神经网络开始进行测试,但是我不知道Y应该是什么。

英文:

I want to build a stock price predictor for an economical indicator in Venezuela, I've cleaned and structured the historical data that I want to use (from the last 10 years), but I have doubts because it's my first machine learning project, my CSV data with 3000+ entries looks like this:

2553
11-28-2017;0.8823561
2554
11-29-2017;0.9679446
2555
11-30-2017;0.9719271
2556
12-1-2017;1.0302427

As you can see column 0 have the date and column 1 have the price for that particular date, in this case the training data (X) should be the price, however the methods that I want to use expect both X and Y (supervised learning), since it's my first time obtaining my own data I feel a bit lost, there you have my code so far: https://github.com/marcelodiaz558/Venezuela-dollar-price-predictor/blob/development/model.ipynb I would like to train my model in the future with a LSTM or maybe I'll start with a simple Artificial Neural Network for testing, when I solve my doubts about the data, I don't know who Y should be.

答案1

得分: 1

Y / 你的目标是你想要预测的内容。 X / 你的训练数据是一些向量表示的先前知识,可以用来改善对未知数量的预测。在简单的时间序列预测中,使用简单的回归器,你的训练数据可以是过去 N 天的价格。

所以,使用你的示例数据,你想要能够根据过去两天的价格(N=2)来预测未来一天的价格,你的 XY 将是:

X = [[0.8823561, 0.9679446], [0.9679446, 0.9719271]]
Y = [0.9719271, 1.0302427]

因此,要对你的数据进行机器学习,你需要根据你的具体需求对数据进行预处理。一些算法专门设计用于此任务,因此可能不需要预处理,或者在实现中会自动完成。

英文:

Y / your target is what you want to predict. X / your training data is some vector representation of your prior knowledge that can be used to better your predictions of the unknown quantity. In a simple time-series prediction with a simple regressor, your training data could be the prices from the past N days.

So using your example data where you want to be able to predict the price one day in the future based on the prices from the last two days (N=2), your X and Y would be

X = [[0.8823561, 0.9679446], [0.9679446, 0.9719271]]
Y = [0.9719271, 1.0302427]

So to do machine learning on your data you would need to pre-process your data depending on exactly what you want. Some algorithms are specifically designed for this task, so will either not need pre-processing or it is done automatically in the implementation.

答案2

得分: 0

抱歉,您选择了一个开始机器学习的困难问题:时间序列

问题在于您必须尊重数值的顺序。

因此,时间序列具有以下内容:

时间戳 t-3: 0.8823561
时间戳 t-2: 0.9679446
时间戳 t-1: 0.9719271
时间戳 t(预测/ y): 1.0302427

R语言有一个非常不错的包,叫做:ARIMA
问题是您必须检查是否有季节性数据(数据中定期出现的一些模式),以及是否有稳定数据(数据是否趋于稳定)。

如果您只想开始机器学习,我的建议是使用鸢尾花数据集进行分类问题。

英文:

Unfortunately you picked a difficult problem to start with ML: Time Series

The issue here is that you has to respect the order of your values.

So a timeseries has:

Timestamp t-3: 0.8823561
Timestamp t-2: 0.9679446
Timestamp t-1: 0.9719271
Timestamp t(prediction/ y): 1.0302427 

A really nice package comes with R which is called: ARIMA.
A problem is that you have to check if you have seasonal data (some patterns which regular occur in your data), stationary data (if the data trends to be stationary).

My recommendation, if you only want to start machine learning go for a classification problem with the iris dataset.

huangapple
  • 本文由 发表于 2020年1月6日 22:37:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613964.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定