2023年6月13日 18:44:21go评论71阅读模式

英文:

Using date variable in a step function AWS

问题

我创建了一个用于创建EMR集群的步骤函数，我希望步骤中的日期根据我执行步骤函数的日期而改变（如果我今天运行它 - 2023年6月13日，我希望它在2023年6月12日之前运行）。我该如何实现？

这是我的代码：

{
  "Comment": "A description of my state machine",
  "StartAt": "EMR CreateCluster",
  "States": {
    "EMR CreateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
      "Parameters": {
        "Name": "IOretrieve",
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "ReleaseLabel": "emr-6.8.0",
        "Applications": [
          {
            "Name": "Spark"
          }
        ],
        "LogUri": "s3://",
        "VisibleToAllUsers": true,
        "Instances": {
          "Ec2SubnetId": "subnet",
          "Ec2KeyName": "",
          "EmrManagedMasterSecurityGroup": "",
          "EmrManagedSlaveSecurityGroup": "",
          "KeepJobFlowAliveWhenNoSteps": true,
          "InstanceFleets": [
            {
              "InstanceFleetType": "MASTER",
              "Name": "Master",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m5.xlarge"
                }
              ]
            },
            {
              "InstanceFleetType": "CORE",
              "Name": "CORE",
              "TargetOnDemandCapacity": 5,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "r5.2xlarge"
                }
              ]
            }
          ]
        },
        "BootstrapActions": [
          {
            "Name": "Custom action",
            "ScriptBootstrapAction": {
              "Path": "s3://",
              "Args": []
            }
          }
        ],
        "Configurations": [
          {
            "Classification": "core-site",
            "Properties": {
              "fs.s3a.connection.maximum": "1000"
            }
          },
          {
            "Classification": "spark",
            "Properties": {
              "maximizeResourceAllocation": "true"
            }
          }
        ]
      },
      "ResultPath": "$.cluster",
      "Next": "Run first step"
    },
    "Run first step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My first EMR step",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "spark-submit",
              "--deploy-mode",
              "client",
              "s3://",
              "--local_run",
              "False",
              "--date_path",
              "year=2023/month=06/day=12/"
            ]
          }
        }
      },
      "ResultPath": "$.firstStep",
      "Next": "Run second step"
    },
    "Run second step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My second EMR step",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "spark-submit",
              "--deploy-mode",
              "client",
              "s3://",
              "--local_run",
              "False",
              "--date_path",
              "year=2023/month=06/day=12/"
            ]
          }
        }
      },
      "ResultPath": "$.secondStep",
      "Next": "EMR TerminateCluster"
    },
    "EMR TerminateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:terminateCluster",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId"
      },
      "End": true
    }
  }
}

你想要更改的日期路径是：

"--date_path",
"year=2023/month=06/day=12/"

希望这可以帮助你。如果有任何其他问题，请随时提出。

英文:

I created a step function for creating EMR cluster
I want that the date in the steps to change according to the date I execute my step function.
(if I run it today - 13.6.2023, I want it to run day before 12.6.2023) how can I do it?
This is my code:

{
&quot;Comment&quot;: &quot;A description of my state machine&quot;,
&quot;StartAt&quot;: &quot;EMR CreateCluster&quot;,
&quot;States&quot;: {
&quot;EMR CreateCluster&quot;: {
&quot;Type&quot;: &quot;Task&quot;,
&quot;Resource&quot;: &quot;arn:aws:states:::elasticmapreduce:createCluster.sync&quot;,
&quot;Parameters&quot;: {
&quot;Name&quot;: &quot;IOretrieve&quot;,
&quot;ServiceRole&quot;: &quot;EMR_DefaultRole&quot;,
&quot;JobFlowRole&quot;: &quot;EMR_EC2_DefaultRole&quot;,
&quot;ReleaseLabel&quot;: &quot;emr-6.8.0&quot;,
&quot;Applications&quot;: [
{
&quot;Name&quot;: &quot;Spark&quot;
}
],
&quot;LogUri&quot;: &quot;s3://&quot;,
&quot;VisibleToAllUsers&quot;: true,
&quot;Instances&quot;: {
&quot;Ec2SubnetId&quot;: &quot;subnet&quot;,
&quot;Ec2KeyName&quot;: &quot;&quot;,
&quot;EmrManagedMasterSecurityGroup&quot;: &quot;&quot;,
&quot;EmrManagedSlaveSecurityGroup&quot;: &quot;&quot;,
&quot;KeepJobFlowAliveWhenNoSteps&quot;: true,
&quot;InstanceFleets&quot;: [
{
&quot;InstanceFleetType&quot;: &quot;MASTER&quot;,
&quot;Name&quot;: &quot;Master&quot;,
&quot;TargetOnDemandCapacity&quot;: 1,
&quot;InstanceTypeConfigs&quot;: [
{
&quot;InstanceType&quot;: &quot;m5.xlarge&quot;
}
]
},
{
&quot;InstanceFleetType&quot;: &quot;CORE&quot;,
&quot;Name&quot;: &quot;CORE&quot;,
&quot;TargetOnDemandCapacity&quot;: 5,
&quot;InstanceTypeConfigs&quot;: [
{
&quot;InstanceType&quot;: &quot;r5.2xlarge&quot;
}
]
}
]
},
&quot;BootstrapActions&quot;: [
{
&quot;Name&quot;: &quot;Custom action&quot;,
&quot;ScriptBootstrapAction&quot;: {
&quot;Path&quot;: &quot;s3://&quot;,
&quot;Args&quot;: []
}
}
],
&quot;Configurations&quot;: [
{
&quot;Classification&quot;: &quot;core-site&quot;,
&quot;Properties&quot;: {
&quot;fs.s3a.connection.maximum&quot;: &quot;1000&quot;
}
},
{
&quot;Classification&quot;: &quot;spark&quot;,
&quot;Properties&quot;: {
&quot;maximizeResourceAllocation&quot;: &quot;true&quot;
}
}
]
},
&quot;ResultPath&quot;: &quot;$.cluster&quot;,
&quot;Next&quot;: &quot;Run first step&quot;
},
&quot;Run first step&quot;: {
&quot;Type&quot;: &quot;Task&quot;,
&quot;Resource&quot;: &quot;arn:aws:states:::elasticmapreduce:addStep.sync&quot;,
&quot;Parameters&quot;: {
&quot;ClusterId.$&quot;: &quot;$.cluster.ClusterId&quot;,
&quot;Step&quot;: {
&quot;Name&quot;: &quot;My first EMR step&quot;,
&quot;HadoopJarStep&quot;: {
&quot;Jar&quot;: &quot;command-runner.jar&quot;,
&quot;Args&quot;: [
&quot;spark-submit&quot;,
&quot;--deploy-mode&quot;,
&quot;client&quot;,
&quot;s3://&quot;,
&quot;--local_run&quot;,
&quot;False&quot;,
&quot;--date_path&quot;,
&quot;year=2023/month=06/day=12/&quot;
]
}
}
},
&quot;ResultPath&quot;: &quot;$.firstStep&quot;,
&quot;Next&quot;: &quot;Run second step&quot;
},
&quot;Run second step&quot;: {
&quot;Type&quot;: &quot;Task&quot;,
&quot;Resource&quot;: &quot;arn:aws:states:::elasticmapreduce:addStep.sync&quot;,
&quot;Parameters&quot;: {
&quot;ClusterId.$&quot;: &quot;$.cluster.ClusterId&quot;,
&quot;Step&quot;: {
&quot;Name&quot;: &quot;My second EMR step&quot;,
&quot;HadoopJarStep&quot;: {
&quot;Jar&quot;: &quot;command-runner.jar&quot;,
&quot;Args&quot;: [
&quot;spark-submit&quot;,
&quot;--deploy-mode&quot;,
&quot;client&quot;,
&quot;s3://&quot;,
&quot;--local_run&quot;,
&quot;False&quot;,
&quot;--date_path&quot;,
&quot;year=2023/month=06/day=12/&quot;
]
}
}
},
&quot;ResultPath&quot;: &quot;$.secondStep&quot;,
&quot;Next&quot;: &quot;EMR TerminateCluster&quot;
},
&quot;EMR TerminateCluster&quot;: {
&quot;Type&quot;: &quot;Task&quot;,
&quot;Resource&quot;: &quot;arn:aws:states:::elasticmapreduce:terminateCluster&quot;,
&quot;Parameters&quot;: {
&quot;ClusterId.$&quot;: &quot;$.cluster.ClusterId&quot;
},
&quot;End&quot;: true
}
}
}

The date path is what I want to change:
"--date_path",
"year=2023/month=06/day=12/"

答案1

得分: 2

AWS Step Functions 提供了一些简单的数学操作的**内置函数**，如 States.MathRandom 和 States.MathAdd。

然而，在撰写此文时（2023 年 6 月），更复杂的计算，例如获取前一天的日期，不能直接完成，需要调用外部进程，即Lambda 函数。

话虽如此，您可以按照以下步骤检索和格式化当前日期和时间。

步骤 1：

使用以下方式从**上下文对象**中检索特定步骤的执行时间：

$$.State.EnteredTime

这将返回以下格式的日期和时间：

2019-03-26T20:14:13.192Z

步骤 2：

使用 States.StringSplit 将执行时间拆分为数组：

States.StringSplit($$.State.EnteredTime, '-,T')

这将返回以下数组：

[
"2019",
"03",
"26",
"20:14:13.192Z"
]

步骤 3：

使用 States.Format 格式化日期路径字符串，使用数组的前三个元素：

States.Format('year={}/month={}/day={}', States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))

步骤 4：

使用 States.Array 创建 Args 数组：

States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)

为了向您展示在您的状态机上下文中如何工作，我已经添加了一个名为“格式化日期路径”的附加通行状态到您的状态机，并替换了任务状态（“运行第二步”）中的 HadoopJarStep.Args 属性：

{
  "Comment": "A description of my state machine",
  "StartAt": "EMR CreateCluster",
  "States": {
    "EMR CreateCluster": {
      ...,
      "Next": "Format date path"
    },
    "Format date path": {
      "Type": "Pass",
      "Parameters": {
        "datePath.$": "States.Format('year={}/month={}/day={}', States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, '-,T'), 2))"
      },
      "Next": "Run second step"
    },
    "Run second step": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.cluster.ClusterId",
        "Step": {
          "Name": "My second EMR step",
          "ClusterId.$": "$.cluster.ClusterId",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args.$": "States.Array('spark-submit', '--deploy-mode', 'client', 's3://', '--local_run', 'False', '--date_path',$.datePath)"
          }
        }
      },
      "ResultPath": "$.secondStep",
      "Next": "EMR TerminateCluster"
    },
    "EMR TerminateCluster": {
      ...
    }
  }
}

英文:

AWS Step Functions provides some simple Intrinsic Functions for math operations, like States.MathRandom and States.MathAdd.

However, at the time of writing (June 2023), more complex calculations, like getting the date of the previous day can't be done out-of-the-box and require invoking an external process, i.e. a Lambda function.

Having that said, you can retrieve and format the current date and time following the steps below.

Step 1:

Retrieve the execution time of the specific step from the Context Object using

$$.State.EnteredTime

This will return the date and time using the following format:

2019-03-26T20:14:13.192Z

Step 2:

Split the execution time into an array using States.StringSplit:

States.StringSplit($$.State.EnteredTime, &#39;-,T&#39;)

This will return the following Array:

[
&quot;2019&quot;,
&quot;03&quot;,
&quot;26&quot;,
&quot;20:14:13.192Z&quot;
]`

Step 3:

Format the date path string using States.Format with the first three elements of the array:

States.Format(&#39;year={}/month={}/day={}&#39;, States.ArrayGetItem($.date.splitDate, 0), States.ArrayGetItem($.date.splitDate, 1), States.ArrayGetItem($.date.splitDate, 2))

Step 4:

Create the Args array using States.Array:

States.Array(&#39;spark-submit&#39;, &#39;--deploy-mode&#39;, &#39;client&#39;, &#39;s3://&#39;, &#39;--local_run&#39;, &#39;False&#39;, &#39;--date_path&#39;,$.datePath)

To show you how this works in the context of your state machine, I've 1/ added an additional pass state called "Format date path": {...} to your state machine, and replaced the HadoopJarStep.Args attribute in your task state ("Run second step": {...}):

{
&quot;Comment&quot;: &quot;A description of my state machine&quot;,
&quot;StartAt&quot;: &quot;EMR CreateCluster&quot;,
&quot;States&quot;: {
&quot;EMR CreateCluster&quot;: {
...,
&quot;Next&quot;: &quot;Format date path&quot;
},
&quot;Format date path&quot;: {
&quot;Type&quot;: &quot;Pass&quot;,
&quot;Parameters&quot;: {
&quot;datePath.$&quot;: &quot;States.Format(&#39;year={}/month={}/day={}&#39;, States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, &#39;-,T&#39;), 0), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, &#39;-,T&#39;), 1), States.ArrayGetItem(States.StringSplit($$.State.EnteredTime, &#39;-,T&#39;), 2))&quot;
},
&quot;Next&quot;: &quot;Run second step&quot;
},
&quot;Run second step&quot;: {
&quot;Type&quot;: &quot;Task&quot;,
&quot;Resource&quot;: &quot;arn:aws:states:::elasticmapreduce:addStep.sync&quot;,
&quot;Parameters&quot;: {
&quot;ClusterId.$&quot;: &quot;$.cluster.ClusterId&quot;,
&quot;Step&quot;: {
&quot;Name&quot;: &quot;My second EMR step&quot;,
&quot;ClusterId.$&quot;: &quot;$.cluster.ClusterId&quot;,
&quot;HadoopJarStep&quot;: {
&quot;Jar&quot;: &quot;command-runner.jar&quot;,
&quot;Args.$&quot;: &quot;States.Array(&#39;spark-submit&#39;, &#39;--deploy-mode&#39;, &#39;client&#39;, &#39;s3://&#39;, &#39;--local_run&#39;, &#39;False&#39;, &#39;--date_path&#39;,$.datePath)&quot;
}
}
},
&quot;ResultPath&quot;: &quot;$.secondStep&quot;,
&quot;Next&quot;: &quot;EMR TerminateCluster&quot;
},
&quot;EMR TerminateCluster&quot;: {
...
}
}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在AWS中使用日期变量的步骤函数

问题

答案1

func init() vs func main() for initalizing global state in AWS Lambda handlers

如何使用AWS CLI将内存中的字符串值上传到S3

Sagemaker批量转换作业 – 输入数据位置

如何从AWS Lambda Layer导入一个JSON对象？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论