Terraform:使用“for each”选项创建Kinesis Firehose交付流。

huangapple go评论57阅读模式
英文:

Terraform: Create a kinesis firehose delivery stream using for each option

问题

我的变量文件

```hcl
variable "my_bucket_map" {
  type = map(object({
    name    = string
    suffix  = string
  }))
  default = {}
}

我的S3存储桶

resource "aws_s3_bucket" "my_bucket" {
  for_each = var.my_bucket_map
  bucket   = "${lower(each.value.name)}.${lower(each.value.suffix)}"
}

我的Glue表依赖于上述S3存储桶

resource "aws_glue_catalog_table" "my_glue_table" {
  for_each      = aws_s3_bucket.my_bucket  #这里使用了嵌套的for_each
  name          = "my_bucket"
  database_name = aws_glue_catalog_database.my_glue_db.name
  table_type    = "EXTERNAL_TABLE"

  partition_keys {
    name = "date"
    type = "date"
  }

  storage_descriptor {
    columns {
      name = "file_name"
      type = "string"
    }
    .
    .
    .

    compressed    = false
    location      = "s3://${each.value.id}//"
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"

    ser_de_info {
      name                  = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }
  }
}

我的KFH传输流依赖于上述创建的Glue和S3

resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
  for_each    = var.my_bucket_map
  name        = "${lower(each.value.name)}.${lower(each.value.suffix)}"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      =  xxxxxxxxxxxxxxxx #==> *如何填充此值*??????
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = xxxxxxxxxxxxxxxx #==> 如何填充此值??????
      }
    }
  }

}

我有一个名为my_bucket_map的映射,用于创建多个AWS资源。
使用相同的映射,我创建了S3存储桶和Glue表,它们是使用嵌套的for_each创建的。
当我尝试创建一个Kinesis传输流,它依赖于这两者以引用一些字段时,我无法得到bucket_arntable_name,因为它们不是直接可访问的。

如何获取这些值?欢迎提出关于以不同方式创建S3存储桶和Glue的建议,以解决这种情况。

英文:

My variable files

variable "my_bucket_map" {
  type = map(object({
    name           = string
	suffix		   = string
  }))
  default = {}
}

My S3 Bucket

resource "aws_s3_bucket" "my_bucket" {
  for_each = var.my_bucket_map
  bucket   = "${lower(each.value.name)}.${lower(each.value.suffix)}"
}

My glue table dependent on above s3 bucket

resource "aws_glue_catalog_table" "my_glue_table" {
  for_each      = aws_s3_bucket.my_bucket  #netsed for each used here
  name          = "my_bucket"
  database_name = aws_glue_catalog_database.my_glue_db.name
  table_type    = "EXTERNAL_TABLE"

  partition_keys {
    name = "date"
    type = "date"
  }

  storage_descriptor {
    columns {
      name = "file_name"
      type = "string"
    }
	.
	.
	.

    compressed    = false
    location      = "s3://${each.value.id}//"
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"

    ser_de_info {
      name                  = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }
  }
}

My KFH delivery stream dependent on both glue and s3 created above

resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
  for_each    = var.my_bucket_map
  name        = "${lower(each.value.name)}.${lower(each.value.suffix)}"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      =  xxxxxxxxxxxxxxxx #==> *how to populate this*??????
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = xxxxxxxxxxxxxxxx #==> how to populate this??????
      }
    }
  }

}

I have a map my_bucket_map which I use to create mutliple aws resource.
Using the same I created s3 buckets and glue tables which are created using the nested for_each.
When I try to create a kinesis delivery stream which has dependency on both of these for referring few fields. So I'm not able to derive bucket_arn and table_name as they are not directly accessible.

How to get these values? Even open for suggestion about creating the s3 bucket and glue in different manner which solves this scenario.

答案1

得分: 2

我可能会在所有资源中都使用相同的变量因为这可以更好地控制资源的链接然后Glue目录表将使用`my_bucket_map`变量与`for_each`,并且您只需将`location`替换为引用存储桶ID

resource "aws_glue_catalog_table" "my_glue_table" {
  for_each      = var.my_bucket_map
  name          = "my_bucket"
  database_name = aws_glue_catalog_database.my_glue_db.name
  table_type    = "EXTERNAL_TABLE"

  partition_keys {
    name = "date"
    type = "date"
  }

  storage_descriptor {
    columns {
      name = "file_name"
      type = "string"
    }
    .
    .
    .

    compressed    = false
    location      = "s3://${aws_s3_bucket.my_bucket[each.key].id}//"
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"

    ser_de_info {
      name                  = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }
  }
}

然后,对于Kinesis Firehose,您只需使用以下内容:

resource "aws_kinesis_firehose_delivery_stream" "my_kinesis_fh" {
  for_each    = var.my_bucket_map
  name        = "${lower(each.value.name)}.${lower(each.value.suffix)}"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      = aws_s3_bucket.my_bucket[each.key].arn
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = aws_glue_catalog_table.my_glue[each.key].name
      }
    }
  }

}

<details>
<summary>英文:</summary>
I would probably use the same variable across all resources as that gives you better control for chaining of the resources. The Glue catalog table would then use the `my_bucket_map` variable with `for_each` and you would just replace the `location` to reference bucket IDs:
```hcl
resource &quot;aws_glue_catalog_table&quot; &quot;my_glue_table&quot; {
for_each      = var.my_bucket_map
name          = &quot;my_bucket&quot;
database_name = aws_glue_catalog_database.my_glue_db.name
table_type    = &quot;EXTERNAL_TABLE&quot;
partition_keys {
name = &quot;date&quot;
type = &quot;date&quot;
}
storage_descriptor {
columns {
name = &quot;file_name&quot;
type = &quot;string&quot;
}
.
.
.
compressed    = false
location      = &quot;s3://${aws_s3_bucket.my_bucket[each.key].id}//&quot;
input_format  = &quot;org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat&quot;
output_format = &quot;org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat&quot;
ser_de_info {
name                  = &quot;ParquetHiveSerDe&quot;
serialization_library = &quot;org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe&quot;
}
}
}

Then, for the Kinesis Firehose, you would just use the following:

resource &quot;aws_kinesis_firehose_delivery_stream&quot; &quot;my_kinesis_fh&quot; {
  for_each    = var.my_bucket_map
  name        = &quot;${lower(each.value.name)}.${lower(each.value.suffix)}&quot;
  destination = &quot;extended_s3&quot;

  extended_s3_configuration {
    role_arn        = aws_iam_role.my_firehose_role.arn
    bucket_arn      = aws_s3_bucket.my_bucket[each.key].arn
    buffer_size     = 128
    buffer_interval = 60

    data_format_conversion_configuration {
      input_format_configuration {
        deserializer {
          open_x_json_ser_de {
          }
        }
      }

      output_format_configuration {
        serializer {
          parquet_ser_de {
          }
        }
      }

      schema_configuration {
        database_name = aws_glue_catalog_database.my_glue_db.name
        role_arn      = aws_iam_role.my_firehose_role.arn
        table_name    = aws_glue_catalog_table.my_glue[each.key].name
      }
    }
  }

}

huangapple
  • 本文由 发表于 2023年3月7日 16:30:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75659561.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定