2023年3月7日 16:30:39go评论142阅读模式

英文:

Terraform: Create a kinesis firehose delivery stream using for each option

问题

我的变量文件

```hcl
variable "my_bucket_map" {
  type = map(object({
    name    = string
    suffix  = string
  }))
  default = {}
}

我的S3存储桶

resource "aws_s3_bucket" "my_bucket" {
  for_each = var.my_bucket_map
  bucket   = "${lower(each.value.name)}.${lower(each.value.suffix)}"
}

我的Glue表依赖于上述S3存储桶

resource "aws_glue_catalog_table" "my_glue_table" {
  for_each      = aws_s3_bucket.my_bucket  #这里使用了嵌套的for_each
  name          = "my_bucket"
  database_name = aws_glue_catalog_database.my_glue_db.name
  table_type    = "EXTERNAL_TABLE"

  partition_keys {
    name = "date"
    type = "date"
  }

  storage_descriptor {
    columns {
      name = "file_name"
      type = "string"
    }
    .
    .
    .

    compressed    = false
    location      = "s3://${each.value.id}//"
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"

    ser_de_info {
      name                  = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }
  }
}

我的KFH传输流依赖于上述创建的Glue和S3

resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
  for_each    = var.my_bucket_map
  name        = "${lower(each.value.name)}.${lower(each.value.suffix)}"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      =  xxxxxxxxxxxxxxxx #==&gt; *如何填充此值*??????
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = xxxxxxxxxxxxxxxx #==&gt; 如何填充此值??????
      }
    }
  }

}

我有一个名为my_bucket_map的映射，用于创建多个AWS资源。
使用相同的映射，我创建了S3存储桶和Glue表，它们是使用嵌套的for_each创建的。
当我尝试创建一个Kinesis传输流，它依赖于这两者以引用一些字段时，我无法得到bucket_arn和table_name，因为它们不是直接可访问的。

如何获取这些值？欢迎提出关于以不同方式创建S3存储桶和Glue的建议，以解决这种情况。

英文:

My variable files

variable &quot;my_bucket_map&quot; {
  type = map(object({
    name           = string
	suffix		   = string
  }))
  default = {}
}

My S3 Bucket

resource &quot;aws_s3_bucket&quot; &quot;my_bucket&quot; {
  for_each = var.my_bucket_map
  bucket   = &quot;${lower(each.value.name)}.${lower(each.value.suffix)}&quot;
}

My glue table dependent on above s3 bucket

resource &quot;aws_glue_catalog_table&quot; &quot;my_glue_table&quot; {
  for_each      = aws_s3_bucket.my_bucket  #netsed for each used here
  name          = &quot;my_bucket&quot;
  database_name = aws_glue_catalog_database.my_glue_db.name
  table_type    = &quot;EXTERNAL_TABLE&quot;

  partition_keys {
    name = &quot;date&quot;
    type = &quot;date&quot;
  }

  storage_descriptor {
    columns {
      name = &quot;file_name&quot;
      type = &quot;string&quot;
    }
	.
	.
	.

    compressed    = false
    location      = &quot;s3://${each.value.id}//&quot;
    input_format  = &quot;org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat&quot;
    output_format = &quot;org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat&quot;

    ser_de_info {
      name                  = &quot;ParquetHiveSerDe&quot;
      serialization_library = &quot;org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe&quot;
    }
  }
}

My KFH delivery stream dependent on both glue and s3 created above

resource &quot;aws_kinesis_firehose_delivery_stream&quot; &quot;my_kinesis_fh&quot; {
  for_each    = var.my_bucket_map
  name        = &quot;${lower(each.value.name)}.${lower(each.value.suffix)}&quot;
  destination = &quot;extended_s3&quot;

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      =  xxxxxxxxxxxxxxxx #==&gt; *how to populate this*??????
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = xxxxxxxxxxxxxxxx #==&gt; how to populate this??????
      }
    }
  }

}

I have a map my_bucket_map which I use to create mutliple aws resource.
Using the same I created s3 buckets and glue tables which are created using the nested for_each.
When I try to create a kinesis delivery stream which has dependency on both of these for referring few fields. So I'm not able to derive bucket_arn and table_name as they are not directly accessible.

How to get these values? Even open for suggestion about creating the s3 bucket and glue in different manner which solves this scenario.

答案1

得分: 2

我可能会在所有资源中都使用相同的变量，因为这可以更好地控制资源的链接。然后，Glue目录表将使用`my_bucket_map`变量与`for_each`，并且您只需将`location`替换为引用存储桶ID：

resource "aws_glue_catalog_table" "my_glue_table" {
  for_each      = var.my_bucket_map
  name          = "my_bucket"
  database_name = aws_glue_catalog_database.my_glue_db.name
  table_type    = "EXTERNAL_TABLE"

  partition_keys {
    name = "date"
    type = "date"
  }

  storage_descriptor {
    columns {
      name = "file_name"
      type = "string"
    }
    .
    .
    .

    compressed    = false
    location      = "s3://${aws_s3_bucket.my_bucket[each.key].id}//"
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"

    ser_de_info {
      name                  = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }
  }
}

然后，对于Kinesis Firehose，您只需使用以下内容：

resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
  for_each    = var.my_bucket_map
  name        = "${lower(each.value.name)}.${lower(each.value.suffix)}"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      = aws_s3_bucket.my_bucket[each.key].arn
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = aws_glue_catalog_table.my_glue[each.key].name
      }
    }
  }

}


<details>
<summary>英文:</summary>
I would probably use the same variable across all resources as that gives you better control for chaining of the resources. The Glue catalog table would then use the `my_bucket_map` variable with `for_each` and you would just replace the `location` to reference bucket IDs:
```hcl
resource &quot;aws_glue_catalog_table&quot; &quot;my_glue_table&quot; {
for_each      = var.my_bucket_map
name          = &quot;my_bucket&quot;
database_name = aws_glue_catalog_database.my_glue_db.name
table_type    = &quot;EXTERNAL_TABLE&quot;
partition_keys {
name = &quot;date&quot;
type = &quot;date&quot;
}
storage_descriptor {
columns {
name = &quot;file_name&quot;
type = &quot;string&quot;
}
.
.
.
compressed    = false
location      = &quot;s3://${aws_s3_bucket.my_bucket[each.key].id}//&quot;
input_format  = &quot;org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat&quot;
output_format = &quot;org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat&quot;
ser_de_info {
name                  = &quot;ParquetHiveSerDe&quot;
serialization_library = &quot;org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe&quot;
}
}
}

Then, for the Kinesis Firehose, you would just use the following:

resource &quot;aws_kinesis_firehose_delivery_stream&quot; &quot;my_kinesis_fh&quot; {
  for_each    = var.my_bucket_map
  name        = &quot;${lower(each.value.name)}.${lower(each.value.suffix)}&quot;
  destination = &quot;extended_s3&quot;

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      = aws_s3_bucket.my_bucket[each.key].arn
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = aws_glue_catalog_table.my_glue[each.key].name
      }
    }
  }

}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Terraform：使用“for each”选项创建Kinesis Firehose交付流。

问题

答案1

PermissionError Forbidden reading from s3 bucket from lambda function.

我无法使用SAM构建我的Spring Boot多模块项目。

DynamoDB 自适应扩展如何重新平衡分区？

如何在AWS Lambda函数中从form-Data中读取文件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论