集成DBT与Airflow使用MWAA实例

huangapple go评论112阅读模式
英文:

Integrating DBT with Airflow using MWAA instance

问题

我最近在MWAA实例上进行了DBTAirflow集成的工作,并且我遵循了这个官方链接,但仍然遇到了困难。因此,决定发表这篇文章。

AIRFLOW版本:2.2.2
  
DBT版本:0.21.1
英文:

I was recently working on the DBT and Airflow integration on the MWAA instance and I followed this official link and I still struggled. Therefore, thought of posting this.

AIRFLOW VERSION: 2.2.2
  
DBT VERSION: 0.21.1

答案1

得分: 2

以下是有助于更好理解此链接的提示:

  1. 首先,不要使用他们的示例项目。创建您自己的DBT项目并将其上传到S3存储桶中。为此,创建一个示例目录并在IDE和终端中导入该目录,然后开始使用pip3 install dbt-snowflake(如果您的目标是snowflake),如果您没有dbt cli,则需要使用此命令。第二步应该是dbt init,它将从用户那里获取一些输入,如项目名称和目标详细信息,并为您创建一个标准的DBT项目结构。没有profiles.yaml,无论您多努力,都无法在MWAA实例上执行任何操作。
  2. 在将项目上传到S3之前,请确保在项目终端上此命令正常运行:dbt run --project-dir . --profiles-dir .
  3. 请记住,您提供给实例的S3存储桶URI将充当实例中的/usr/local/airflow路径。因此,在DAG代码中进行路径更改时,请确保了解这一点。
  4. 根据您的目标,您可以决定是否需要dbt-postgres==0.21.1dbt-redshift==0.21.1。尽量避免不需要的依赖项,并将它们从requirements.txt中删除,这对您的用例不会有影响。
  5. 此外,您需要在dbt-project.yml文件的version下方提供此值-> config-version: 2,并确保仅对dbt run命令的modelsseeds路径进行注释。

您可以使用他们的DAG代码。它很完美。只需确保您的路径是正确的。由于这些原因,我浪费了很多时间。因此,希望与社区分享这些信息。

英文:

Below are the hints that would help to understand this link better:

  1. First of all, don't use their sample project. Create your own DBT project and upload that to the S3 bucket. For that create a sample directory and import that directory in the IDE and on the terminal and start with pip3 install dbt-snowflake (if your sink is supposed to be snowflake) if you don't have dbt cli.
    The second step should be dbt init which will prompt for a few inputs from the user like project name and sink details, creating a standard DBT project structure for you. No matter how hard you try without profiles.yaml you won't be able to execute anything on the MWAA instance.
  2. Make sure that this command on your project terminal works fine before uploading the project on S3: dbt run --project-dir . --profiles-dir .
  3. Remember that, the S3 bucket URI you provide to the instance it would be acting as a /usr/local/airflow path in the instance. So, while making the path changes in the DAG code make sure you know this.
  4. Based on your sink you can decide whether you need dbt-postgres==0.21.1, dbt-redshift==0.21.1 or not. Try to avoid dependencies and remove them from requirements.txt that won't be required for your use case.
  5. Also, you need to provide this value below the version in the dbt-project.yml file-> config-version: 2 and make sure you comment out models and seeds path for just dbt run command.

You can use their dag code. It's perfect. Just make sure your path is correct. These are a few of the points due to which I wasted a lot of time. Therefore, wanted to share with the community.

huangapple
  • 本文由 发表于 2023年3月9日 16:03:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75681853.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定