英文:
Using date variable in a step function AWS
问题
我创建了一个用于创建EMR集群的步骤函数,我希望步骤中的日期根据我执行步骤函数的日期而改变(如果我今天运行它 - 2023年6月13日,我希望它在2023年6月12日之前运行)。我该如何实现?
这是我的代码:
{
"Comment": "A description of my state machine",
"StartAt": "EMR CreateCluster",
"States": {
"EMR CreateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "IOretrieve",
"ServiceRole": "EMR_DefaultRole",
"JobFlowRole": "EMR_EC2_DefaultRole",
"ReleaseLabel": "emr-6.8.0",
"Applications": [
{
"Name": "Spark"
}
],
"LogUri": "s3://",
"VisibleToAllUsers": true,
"Instances": {
"Ec2SubnetId": "subnet",
"Ec2KeyName": "",
"EmrManagedMasterSecurityGroup": "",
"EmrManagedSlaveSecurityGroup": "",
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"InstanceFleetType": "MASTER",
"Name": "Master",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m5.xlarge"
}
]
},
{
"InstanceFleetType": "CORE",
"Name": "CORE",
"TargetOnDemandCapacity": 5,
"InstanceTypeConfigs": [
{
"InstanceType": "r5.2xlarge"
}
]
}
]
},
"BootstrapActions": [
{
"Name": "Custom action",
"ScriptBootstrapAction": {
"Path": "s3://",
"Args": []
}
}
],
"Configurations": [
{
"Classification": "core-site",
"Properties": {
"fs.s3a.connection.maximum": "1000"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
},
"ResultPath": "$.cluster",
"Next": "Run first step"
},
"Run first step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My first EMR step",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"s3://",
"--local_run",
"False",
"--date_path",
"year=2023/month=06/day=12/"
]
}
}
},
"ResultPath": "$.firstStep",
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"s3://",
"--local_run",
"False",
"--date_path",
"year=2023/month=06/day=12/"
]
}
}
},
"ResultPath": "$.secondStep",
"Next": "EMR TerminateCluster"
},
"EMR TerminateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId"
},
"End": true
}
}
}
你想要更改的日期路径是:
"--date_path",
"year=2023/month=06/day=12/"
希望这可以帮助你。如果有任何其他问题,请随时提出。
英文:
I created a step function for creating EMR cluster
I want that the date in the steps to change according to the date I execute my step function.
(if I run it today - 13.6.2023, I want it to run day before 12.6.2023) how can I do it?
This is my code:
{
"Comment": "A description of my state machine",
"StartAt": "EMR CreateCluster",
"States": {
"EMR CreateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "IOretrieve",
"ServiceRole": "EMR_DefaultRole",
"JobFlowRole": "EMR_EC2_DefaultRole",
"ReleaseLabel": "emr-6.8.0",
"Applications": [
{
"Name": "Spark"
}
],
"LogUri": "s3://",
"VisibleToAllUsers": true,
"Instances": {
"Ec2SubnetId": "subnet",
"Ec2KeyName": "",
"EmrManagedMasterSecurityGroup": "",
"EmrManagedSlaveSecurityGroup": "",
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"InstanceFleetType": "MASTER",
"Name": "Master",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m5.xlarge"
}
]
},
{
"InstanceFleetType": "CORE",
"Name": "CORE",
"TargetOnDemandCapacity": 5,
"InstanceTypeConfigs": [
{
"InstanceType": "r5.2xlarge"
}
]
}
]
},
"BootstrapActions": [
{
"Name": "Custom action",
"ScriptBootstrapAction": {
"Path": "s3://",
"Args": []
}
}
],
"Configurations": [
{
"Classification": "core-site",
"Properties": {
"fs.s3a.connection.maximum": "1000"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
},
"ResultPath": "$.cluster",
"Next": "Run first step"
},
"Run first step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My first EMR step",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"s3://",
"--local_run",
"False",
"--date_path",
"year=2023/month=06/day=12/"
]
}
}
},
"ResultPath": "$.firstStep",
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"s3://",
"--local_run",
"False",
"--date_path",
"year=2023/month=06/day=12/"
]
}
}
},
"ResultPath": "$.secondStep",
"Next": "EMR TerminateCluster"
},
"EMR TerminateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId"
},
"End": true
}
}
}
The date path is what I want to change:
"--date_path",
"year=2023/month=06/day=12/"
答案1
得分: 2
AWS Step Functions 提供了一些简单的数学操作的**内置函数**,如 States.MathRandom
和 States.MathAdd
。
然而,在撰写此文时(2023 年 6 月),更复杂的计算,例如获取前一天的日期,不能直接完成,需要调用外部进程,即Lambda 函数。
话虽如此,您可以按照以下步骤检索和格式化当前日期和时间。
步骤 1:
使用以下方式从**上下文对象**中检索特定步骤的执行时间:
$$.State.EnteredTime
这将返回以下格式的日期和时间:
2019-03-26T20:14:13.192Z
步骤 2:
使用 States.StringSplit
将执行时间拆分为数组:
States.StringSplit($$.State.EnteredTime, '-,T')
这将返回以下数组:
[
"2019",
"03",
"26",
"20:14:13.192Z"
]
步骤 3:
使用 States.Format
格式化日期路径字符串,使用数组的前三个元素:
States.Format('year={}/month={}/day={}', States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))
步骤 4:
使用 States.Array
创建 Args 数组:
States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)
为了向您展示在您的状态机上下文中如何工作,我已经添加了一个名为“格式化日期路径”的附加通行状态到您的状态机,并替换了任务状态(“运行第二步”)中的 HadoopJarStep.Args
属性:
{
"Comment": "A description of my state machine",
"StartAt": "EMR CreateCluster",
"States": {
"EMR CreateCluster": {
...,
"Next": "Format date path"
},
"Format date path": {
"Type": "Pass",
"Parameters": {
"datePath.$": "States.Format('year={}/month={}/day={}', States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 2))"
},
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"ClusterId.$": "$.cluster.ClusterId",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)"
}
}
},
"ResultPath": "$.secondStep",
"Next": "EMR TerminateCluster"
},
"EMR TerminateCluster": {
...
}
}
}
英文:
AWS Step Functions provides some simple Intrinsic Functions for math operations, like States.MathRandom
and States.MathAdd
.
However, at the time of writing (June 2023), more complex calculations, like getting the date of the previous day can't be done out-of-the-box and require invoking an external process, i.e. a Lambda function.
Having that said, you can retrieve and format the current date and time following the steps below.
Step 1:
Retrieve the execution time of the specific step from the Context Object using
$$.State.EnteredTime
This will return the date and time using the following format:
2019-03-26T20:14:13.192Z
Step 2:
Split the execution time into an array using States.StringSplit
:
States.StringSplit($$.State.EnteredTime, '-,T')
This will return the following Array:
[
"2019",
"03",
"26",
"20:14:13.192Z"
]`
Step 3:
Format the date path string using States.Format
with the first three elements of the array:
States.Format('year={}/month={}/day={}', States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))
Step 4:
Create the Args array using States.Array
:
States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)
To show you how this works in the context of your state machine, I've 1/ added an additional pass state called "Format date path": {...}
to your state machine, and replaced the HadoopJarStep.Args
attribute in your task state ("Run second step": {...}
):
{
"Comment": "A description of my state machine",
"StartAt": "EMR CreateCluster",
"States": {
"EMR CreateCluster": {
...,
"Next": "Format date path"
},
"Format date path": {
"Type": "Pass",
"Parameters": {
"datePath.$": "States.Format('year={}/month={}/day={}', States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 2))"
},
"Next": "Run second step"
},
"Run second step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "My second EMR step",
"ClusterId.$": "$.cluster.ClusterId",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)"
}
}
},
"ResultPath": "$.secondStep",
"Next": "EMR TerminateCluster"
},
"EMR TerminateCluster": {
...
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论