英文:
Terraform: Create a kinesis firehose delivery stream using for each option
问题
我的变量文件
```hcl
variable "my_bucket_map" {
type = map(object({
name = string
suffix = string
}))
default = {}
}
我的S3存储桶
resource "aws_s3_bucket" "my_bucket" {
for_each = var.my_bucket_map
bucket = "${lower(each.value.name)}.${lower(each.value.suffix)}"
}
我的Glue表依赖于上述S3存储桶
resource "aws_glue_catalog_table" "my_glue_table" {
for_each = aws_s3_bucket.my_bucket #这里使用了嵌套的for_each
name = "my_bucket"
database_name = aws_glue_catalog_database.my_glue_db.name
table_type = "EXTERNAL_TABLE"
partition_keys {
name = "date"
type = "date"
}
storage_descriptor {
columns {
name = "file_name"
type = "string"
}
.
.
.
compressed = false
location = "s3://${each.value.id}//"
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "ParquetHiveSerDe"
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
}
}
我的KFH传输流依赖于上述创建的Glue和S3
resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
for_each = var.my_bucket_map
name = "${lower(each.value.name)}.${lower(each.value.suffix)}"
destination = "extended_s3"
extended_s3_configuration {
role_arn = aws_iam_role.my_firehose_role.arn
bucket_arn = xxxxxxxxxxxxxxxx #==> *如何填充此值*??????
buffer_size = 128
buffer_interval = 60
data_format_conversion_configuration {
input_format_configuration {
deserializer {
open_x_json_ser_de {
}
}
}
output_format_configuration {
serializer {
parquet_ser_de {
}
}
}
schema_configuration {
database_name = aws_glue_catalog_database.my_glue_db.name
role_arn = aws_iam_role.my_firehose_role.arn
table_name = xxxxxxxxxxxxxxxx #==> 如何填充此值??????
}
}
}
}
我有一个名为my_bucket_map的映射,用于创建多个AWS资源。
使用相同的映射,我创建了S3存储桶和Glue表,它们是使用嵌套的for_each创建的。
当我尝试创建一个Kinesis传输流,它依赖于这两者以引用一些字段时,我无法得到bucket_arn和table_name,因为它们不是直接可访问的。
如何获取这些值?欢迎提出关于以不同方式创建S3存储桶和Glue的建议,以解决这种情况。
英文:
My variable files
variable "my_bucket_map" {
type = map(object({
name = string
suffix = string
}))
default = {}
}
My S3 Bucket
resource "aws_s3_bucket" "my_bucket" {
for_each = var.my_bucket_map
bucket = "${lower(each.value.name)}.${lower(each.value.suffix)}"
}
My glue table dependent on above s3 bucket
resource "aws_glue_catalog_table" "my_glue_table" {
for_each = aws_s3_bucket.my_bucket #netsed for each used here
name = "my_bucket"
database_name = aws_glue_catalog_database.my_glue_db.name
table_type = "EXTERNAL_TABLE"
partition_keys {
name = "date"
type = "date"
}
storage_descriptor {
columns {
name = "file_name"
type = "string"
}
.
.
.
compressed = false
location = "s3://${each.value.id}//"
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "ParquetHiveSerDe"
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
}
}
My KFH delivery stream dependent on both glue and s3 created above
resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
for_each = var.my_bucket_map
name = "${lower(each.value.name)}.${lower(each.value.suffix)}"
destination = "extended_s3"
extended_s3_configuration {
role_arn = aws_iam_role.my_firehose_role.arn
bucket_arn = xxxxxxxxxxxxxxxx #==> *how to populate this*??????
buffer_size = 128
buffer_interval = 60
data_format_conversion_configuration {
input_format_configuration {
deserializer {
open_x_json_ser_de {
}
}
}
output_format_configuration {
serializer {
parquet_ser_de {
}
}
}
schema_configuration {
database_name = aws_glue_catalog_database.my_glue_db.name
role_arn = aws_iam_role.my_firehose_role.arn
table_name = xxxxxxxxxxxxxxxx #==> how to populate this??????
}
}
}
}
I have a map my_bucket_map which I use to create mutliple aws resource.
Using the same I created s3 buckets and glue tables which are created using the nested for_each.
When I try to create a kinesis delivery stream which has dependency on both of these for referring few fields. So I'm not able to derive bucket_arn and table_name as they are not directly accessible.
How to get these values? Even open for suggestion about creating the s3 bucket and glue in different manner which solves this scenario.
答案1
得分: 2
我可能会在所有资源中都使用相同的变量,因为这可以更好地控制资源的链接。然后,Glue目录表将使用`my_bucket_map`变量与`for_each`,并且您只需将`location`替换为引用存储桶ID:
resource "aws_glue_catalog_table" "my_glue_table" {
for_each = var.my_bucket_map
name = "my_bucket"
database_name = aws_glue_catalog_database.my_glue_db.name
table_type = "EXTERNAL_TABLE"
partition_keys {
name = "date"
type = "date"
}
storage_descriptor {
columns {
name = "file_name"
type = "string"
}
.
.
.
compressed = false
location = "s3://${aws_s3_bucket.my_bucket[each.key].id}//"
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "ParquetHiveSerDe"
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
}
}
然后,对于Kinesis Firehose,您只需使用以下内容:
resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
for_each = var.my_bucket_map
name = "${lower(each.value.name)}.${lower(each.value.suffix)}"
destination = "extended_s3"
extended_s3_configuration {
role_arn = aws_iam_role.my_firehose_role.arn
bucket_arn = aws_s3_bucket.my_bucket[each.key].arn
buffer_size = 128
buffer_interval = 60
data_format_conversion_configuration {
input_format_configuration {
deserializer {
open_x_json_ser_de {
}
}
}
output_format_configuration {
serializer {
parquet_ser_de {
}
}
}
schema_configuration {
database_name = aws_glue_catalog_database.my_glue_db.name
role_arn = aws_iam_role.my_firehose_role.arn
table_name = aws_glue_catalog_table.my_glue[each.key].name
}
}
}
}
<details>
<summary>英文:</summary>
I would probably use the same variable across all resources as that gives you better control for chaining of the resources. The Glue catalog table would then use the `my_bucket_map` variable with `for_each` and you would just replace the `location` to reference bucket IDs:
```hcl
resource "aws_glue_catalog_table" "my_glue_table" {
for_each = var.my_bucket_map
name = "my_bucket"
database_name = aws_glue_catalog_database.my_glue_db.name
table_type = "EXTERNAL_TABLE"
partition_keys {
name = "date"
type = "date"
}
storage_descriptor {
columns {
name = "file_name"
type = "string"
}
.
.
.
compressed = false
location = "s3://${aws_s3_bucket.my_bucket[each.key].id}//"
input_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
ser_de_info {
name = "ParquetHiveSerDe"
serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
}
}
}
Then, for the Kinesis Firehose, you would just use the following:
resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
for_each = var.my_bucket_map
name = "${lower(each.value.name)}.${lower(each.value.suffix)}"
destination = "extended_s3"
extended_s3_configuration {
role_arn = aws_iam_role.my_firehose_role.arn
bucket_arn = aws_s3_bucket.my_bucket[each.key].arn
buffer_size = 128
buffer_interval = 60
data_format_conversion_configuration {
input_format_configuration {
deserializer {
open_x_json_ser_de {
}
}
}
output_format_configuration {
serializer {
parquet_ser_de {
}
}
}
schema_configuration {
database_name = aws_glue_catalog_database.my_glue_db.name
role_arn = aws_iam_role.my_firehose_role.arn
table_name = aws_glue_catalog_table.my_glue[each.key].name
}
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论