需要帮助理解AutoML的数据结构。

huangapple go评论93阅读模式
英文:

Need help understanding the data structure for AutoML

问题

我明白,你想为你的小金箱创建一个预测模型。从你的描述看,好像数据的结构需要调整,将每个参数的值都列在每一行。是的,结构对AutoML很重要,确保每个参数都有自己的列可能会更有效。

英文:

I have an IoT device that updates an Azure Storage Table anytime one of its values changes. For example, If the fish tank temperature changes from 68 to 69, that gets logged. If the filter pump runs, that gets logged. When the little treasure chest opens and bubbles come out, that gets logged. This makes my tabular data look like this:

TimeStamp  Name                 Value
(time)     TreaureChestBubbles  2.8
(time)     TreaureChestBubbles  5
(time)     FilterPumpRunning    1
(time)     TreaureChestBubbles  3.5
(time)     FilterPumpRunning    0
(time)     WaterTemp	        66
(time)     TreaureChestBubbles  -1 (indicating an error)

I want to create a model that predicts when my little treasure chest is going to fail.

I dumped all this data into an AutoML job and clicked go...and it failed miserably. Then I started reading the documentation. I find lots of documentation talking about setting up experiments, but very little concerning the exact structure of the data. It looks like my tabular data needs to have EVERY parameter in each row? So instead of a Name column, I'd need a TreaureChestBubblesValue column, a WaterTempValues column, a FilterPumpRunningValues, etc.

TimeStamp TreaureChestBubblesValue WaterTempValues ... FilterPumpRunningValues
(time)    2.8                      67                  0
(time)    5                        67                  0
(time)    5                        66                  0
(time)    8.4                      66                  1
(time)    2.8                      67                  0

Does that sound correct? Or does the structure of the data not matter for AutoML so long as its tabular?

答案1

得分: 0

根据此链接:https://learn.microsoft.com/en-us/azure/machine-learning/concept-automl-forecasting-methods#how-automl-uses-your-data

AutoML接受以表格形式呈现的时间序列数据,这意味着每个变量必须有自己对应的列。AutoML要求其中一个列必须作为预测问题的时间轴。这个列必须能够解析为日期时间类型。最简单的时间序列数据集由一个时间列和一个数值目标列组成。目标是打算预测的变量。以下是这种简单情况下的格式示例:

时间戳     数量
2012-01-01 100
2012-01-02 97
2012-01-03 106
...        ...
2013-12-31 347

在更复杂的情况下,数据可能包含与时间索引对齐的其他列。

时间戳     SKU     价格    广告    数量
2012-01-01 JUICE1  3.5    0     100
2012-01-01 BREAD3  5.7    60    47
2012-01-02 JUICE1  3.5    0     97
2012-01-02 BREAD3  5.5    1     68
...        ...     ...    ...   ...
2013-12-31 JUICE1  3.7    50    347
2013-12-31 BREAD3  5.7    0     94

在这个示例中,有一个SKU、零售价格以及指示商品是否有广告的标志,除了时间戳和目标数量。显然,这个数据集中有两个系列 - 一个是JUICE1 SKU的系列,另一个是BREAD3 SKU的系列;SKU列是一个时间序列ID列,因为按照它分组会得到包含一个系列的两个组。在对模型进行全面检查之前,AutoML会对输入配置和数据进行基本验证,并添加工程特性。

英文:

Per this link: https://learn.microsoft.com/en-us/azure/machine-learning/concept-automl-forecasting-methods#how-automl-uses-your-data

> AutoML accepts time series data in tabular, "wide" format; that is, each variable must have its own corresponding column. AutoML requires one of the columns to be the time axis for the forecasting problem. This column must be parsable into a datetime type. The simplest time series data set consists of a time column and a numeric target column. The target is the variable one intends to predict into the future. The following is an example of the format in this simple case:

timestamp   quantity
2012-01-01	100
2012-01-02	97
2012-01-03	106
...	        ...
2013-12-31	347

>In more complex cases, the data may contain other columns aligned with the time index.

timestamp    SKU    price  advertised  quantity
2012-01-01   JUICE1 3.5    0           100
2012-01-01	 BREAD3 5.7    60          47
2012-01-02   JUICE1 3.5    0           97
2012-01-02   BREAD3 5.5    1           68
...          ...    ...    ...         ...
2013-12-31   JUICE1 3.7    50          347
2013-12-31   BREAD3 5.7	   0           94

>In this example, there's a SKU, a retail price, and a flag indicating whether an item was advertised in addition to the timestamp and target quantity. There are evidently two series in this dataset - one for the JUICE1 SKU and one for the BREAD3 SKU; the SKU column is a time series ID column since grouping by it gives two groups containing a single series each. Before sweeping over models, AutoML does basic validation of the input configuration and data and adds engineered features.

huangapple
  • 本文由 发表于 2023年2月24日 05:51:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75550689.html
  • azure-auto-ml
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定