在AWS中使用日期变量的步骤函数

huangapple go评论65阅读模式
英文:

Using date variable in a step function AWS

问题

我创建了一个用于创建EMR集群的步骤函数,我希望步骤中的日期根据我执行步骤函数的日期而改变(如果我今天运行它 - 2023年6月13日,我希望它在2023年6月12日之前运行)。我该如何实现?

这是我的代码:

{
  "Comment": "A description of my state machine",
  "StartAt": "EMR CreateCluster",
  "States": {
    "EMR CreateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
      "Parameters": {
        "Name": "IOretrieve",
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "ReleaseLabel": "emr-6.8.0",
        "Applications": [
          {
            "Name": "Spark"
          }
        ],
        "LogUri": "s3://",
        "VisibleToAllUsers": true,
        "Instances": {
          "Ec2SubnetId": "subnet",
          "Ec2KeyName": "",
          "EmrManagedMasterSecurityGroup": "",
          "EmrManagedSlaveSecurityGroup": "",
          "KeepJobFlowAliveWhenNoSteps": true,
          "InstanceFleets": [
            {
              "InstanceFleetType": "MASTER",
              "Name": "Master",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m5.xlarge"
                }
              ]
            },
            {
              "InstanceFleetType": "CORE",
              "Name": "CORE",
              "TargetOnDemandCapacity": 5,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "r5.2xlarge"
                }
              ]
            }
          ]
        },
        "BootstrapActions": [
          {
            "Name": "Custom action",
            "ScriptBootstrapAction": {
              "Path": "s3://",
              "Args": []
            }
          }
        ],
        "Configurations": [
          {
            "Classification": "core-site",
            "Properties": {
              "fs.s3a.connection.maximum": "1000"
            }
          },
          {
            "Classification": "spark",
            "Properties": {
              "maximizeResourceAllocation": "true"
            }
          }
        ]
      },
      "ResultPath": "$.cluster",
      "Next": "Run first step"
    },
    "Run first step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My first EMR step",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "spark-submit",
              "--deploy-mode",
              "client",
              "s3://",
              "--local_run",
              "False",
              "--date_path",
              "year=2023/month=06/day=12/"
            ]
          }
        }
      },
      "ResultPath": "$.firstStep",
      "Next": "Run second step"
    },
    "Run second step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My second EMR step",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "spark-submit",
              "--deploy-mode",
              "client",
              "s3://",
              "--local_run",
              "False",
              "--date_path",
              "year=2023/month=06/day=12/"
            ]
          }
        }
      },
      "ResultPath": "$.secondStep",
      "Next": "EMR TerminateCluster"
    },
    "EMR TerminateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:terminateCluster",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId"
      },
      "End": true
    }
  }
}

你想要更改的日期路径是:

"--date_path",
"year=2023/month=06/day=12/"

希望这可以帮助你。如果有任何其他问题,请随时提出。

英文:

I created a step function for creating EMR cluster
I want that the date in the steps to change according to the date I execute my step function.
(if I run it today - 13.6.2023, I want it to run day before 12.6.2023) how can I do it?
This is my code:

{
"Comment": "A description of my state machine",
"StartAt": "EMR CreateCluster",
"States": {
"EMR CreateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "IOretrieve",
"ServiceRole": "EMR_DefaultRole",
"JobFlowRole": "EMR_EC2_DefaultRole",
"ReleaseLabel": "emr-6.8.0",
"Applications": [
{
"Name": "Spark"
}
],
"LogUri": "s3://",
"VisibleToAllUsers": true,
"Instances": {
"Ec2SubnetId": "subnet",
"Ec2KeyName": "",
"EmrManagedMasterSecurityGroup": "",
"EmrManagedSlaveSecurityGroup": "",
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"InstanceFleetType": "MASTER",
"Name": "Master",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m5.xlarge"
}
]
},
{
"InstanceFleetType": "CORE",
"Name": "CORE",
"TargetOnDemandCapacity": 5,
"InstanceTypeConfigs": [
{
"InstanceType": "r5.2xlarge"
}
]
}
]
},
"BootstrapActions": [
{
"Name": "Custom action",
"ScriptBootstrapAction": {
"Path": "s3://",
"Args": []
}
}
],
"Configurations": [
{
"Classification": "core-site",
"Properties": {
"fs.s3a.connection.maximum": "1000"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
},
"ResultPath": "$.cluster",
"Next": "Run first step"
},
"Run first step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My first EMR step",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"s3://",
"--local_run",
"False",
"--date_path",
"year=2023/month=06/day=12/"
]
}
}
},
"ResultPath": "$.firstStep",
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"s3://",
"--local_run",
"False",
"--date_path",
"year=2023/month=06/day=12/"
]
}
}
},
"ResultPath": "$.secondStep",
"Next": "EMR TerminateCluster"
},
"EMR TerminateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId"
},
"End": true
}
}
}

The date path is what I want to change:
"--date_path",
"year=2023/month=06/day=12/"

答案1

得分: 2

AWS Step Functions 提供了一些简单的数学操作的**内置函数**,如 States.MathRandomStates.MathAdd

然而,在撰写此文时(2023 年 6 月),更复杂的计算,例如获取前一天的日期,不能直接完成,需要调用外部进程,即Lambda 函数


话虽如此,您可以按照以下步骤检索和格式化当前日期和时间。

步骤 1:

使用以下方式从**上下文对象**中检索特定步骤的执行时间:

$$.State.EnteredTime

这将返回以下格式的日期和时间:

2019-03-26T20:14:13.192Z

步骤 2:

使用 States.StringSplit 将执行时间拆分为数组:

States.StringSplit($$.State.EnteredTime, '-,T')

这将返回以下数组:

[
"2019",
"03",
"26",
"20:14:13.192Z"
]

步骤 3:

使用 States.Format 格式化日期路径字符串,使用数组的前三个元素:

States.Format('year={}/month={}/day={}', States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))

步骤 4:

使用 States.Array 创建 Args 数组:

States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)

为了向您展示在您的状态机上下文中如何工作,我已经添加了一个名为“格式化日期路径”的附加通行状态到您的状态机,并替换了任务状态(“运行第二步”)中的 HadoopJarStep.Args 属性:

{
  "Comment": "A description of my state machine",
  "StartAt": "EMR CreateCluster",
  "States": {
    "EMR CreateCluster": {
      ...,
      "Next": "Format date path"
    },
    "Format date path": {
      "Type": "Pass",
      "Parameters": {
        "datePath.$": "States.Format('year={}/month={}/day={}', States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 2))"
      },
      "Next": "Run second step"
    },
    "Run second step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My second EMR step",
          "ClusterId.$": "$.cluster.ClusterId",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args.$": "States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)"
          }
        }
      },
      "ResultPath": "$.secondStep",
      "Next": "EMR TerminateCluster"
    },
    "EMR TerminateCluster": {
      ...
    }
  }
}
英文:

AWS Step Functions provides some simple Intrinsic Functions for math operations, like States.MathRandom and States.MathAdd.

However, at the time of writing (June 2023), more complex calculations, like getting the date of the previous day can't be done out-of-the-box and require invoking an external process, i.e. a Lambda function.


Having that said, you can retrieve and format the current date and time following the steps below.

Step 1:

Retrieve the execution time of the specific step from the Context Object using

$$.State.EnteredTime

This will return the date and time using the following format:

2019-03-26T20:14:13.192Z

Step 2:

Split the execution time into an array using States.StringSplit:

States.StringSplit($$.State.EnteredTime, '-,T')

This will return the following Array:

[
"2019",
"03",
"26",
"20:14:13.192Z"
]`

Step 3:

Format the date path string using States.Format with the first three elements of the array:

States.Format('year={}/month={}/day={}', States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))

Step 4:

Create the Args array using States.Array:

States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)

To show you how this works in the context of your state machine, I've 1/ added an additional pass state called "Format date path": {...} to your state machine, and replaced the HadoopJarStep.Args attribute in your task state ("Run second step": {...}):

{
"Comment": "A description of my state machine",
"StartAt": "EMR CreateCluster",
"States": {
"EMR CreateCluster": {
...,
"Next": "Format date path"
},
"Format date path": {
"Type": "Pass",
"Parameters": {
"datePath.$": "States.Format('year={}/month={}/day={}', States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 2))"
},
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"ClusterId.$": "$.cluster.ClusterId",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)"
}
}
},
"ResultPath": "$.secondStep",
"Next": "EMR TerminateCluster"
},
"EMR TerminateCluster": {
...
}
}
}

huangapple
  • 本文由 发表于 2023年6月13日 18:44:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76464070.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定