英文:
Integrating DBT with Airflow using MWAA instance
问题
我最近在MWAA
实例上进行了DBT
和Airflow
集成的工作,并且我遵循了这个官方链接,但仍然遇到了困难。因此,决定发表这篇文章。
AIRFLOW版本:2.2.2
DBT版本:0.21.1
英文:
I was recently working on the DBT
and Airflow
integration on the MWAA
instance and I followed this official link and I still struggled. Therefore, thought of posting this.
AIRFLOW VERSION: 2.2.2
DBT VERSION: 0.21.1
答案1
得分: 2
以下是有助于更好理解此链接的提示:
- 首先,不要使用他们的示例项目。创建您自己的DBT项目并将其上传到S3存储桶中。为此,创建一个示例目录并在IDE和终端中导入该目录,然后开始使用
pip3 install dbt-snowflake
(如果您的目标是snowflake),如果您没有dbt cli
,则需要使用此命令。第二步应该是dbt init
,它将从用户那里获取一些输入,如项目名称和目标详细信息,并为您创建一个标准的DBT项目结构。没有profiles.yaml
,无论您多努力,都无法在MWAA
实例上执行任何操作。 - 在将项目上传到S3之前,请确保在项目终端上此命令正常运行:
dbt run --project-dir . --profiles-dir .
- 请记住,您提供给实例的
S3存储桶URI
将充当实例中的/usr/local/airflow
路径。因此,在DAG代码中进行路径更改时,请确保了解这一点。 - 根据您的目标,您可以决定是否需要
dbt-postgres==0.21.1
,dbt-redshift==0.21.1
。尽量避免不需要的依赖项,并将它们从requirements.txt
中删除,这对您的用例不会有影响。 - 此外,您需要在
dbt-project.yml
文件的version
下方提供此值->config-version: 2
,并确保仅对dbt run
命令的models
和seeds
路径进行注释。
您可以使用他们的DAG代码。它很完美。只需确保您的路径是正确的。由于这些原因,我浪费了很多时间。因此,希望与社区分享这些信息。
英文:
Below are the hints that would help to understand this link better:
- First of all, don't use their sample project. Create your own DBT project and upload that to the S3 bucket. For that create a sample directory and import that directory in the IDE and on the terminal and start with
pip3 install dbt-snowflake
(if your sink is supposed to be snowflake) if you don't havedbt cli
.
The second step should bedbt init
which will prompt for a few inputs from the user like project name and sink details, creating a standard DBT project structure for you. No matter how hard you try withoutprofiles.yaml
you won't be able to execute anything on theMWAA
instance. - Make sure that this command on your project terminal works fine before uploading the project on S3:
dbt run --project-dir . --profiles-dir .
- Remember that, the
S3 bucket URI
you provide to the instance it would be acting as a/usr/local/airflow
path in the instance. So, while making the path changes in the DAG code make sure you know this. - Based on your sink you can decide whether you need
dbt-postgres==0.21.1
,dbt-redshift==0.21.1
or not. Try to avoid dependencies and remove them fromrequirements.txt
that won't be required for your use case. - Also, you need to provide this value below the
version
in thedbt-project.yml
file->config-version: 2
and make sure you comment outmodels
andseeds
path for justdbt run
command.
You can use their dag code. It's perfect. Just make sure your path is correct. These are a few of the points due to which I wasted a lot of time. Therefore, wanted to share with the community.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论