安装Airflow在计算引擎上。

huangapple go评论114阅读模式
英文:

install airflow on compute engine

问题

我想问一下,你们中有没有人在Compute Engine上安装Airflow的经验。我一直在Google上搜索指南并向ChatGPT提问,但迄今为止都没有成功。我想知道在Compute Engine上安装Airflow的正确命令顺序。我相信我的安装问题可能是因为我不了解Airflow的正确安装步骤。谢谢你的回应。

英文:

I would like to ask if any of you have experience installing Airflow on Compute Engine. I have been searching for instructions on Google and asking ChatGPT, but I have not been successful so far. I would like to know the correct sequence of commands for installing Airflow on Compute Engine. I believe my installation issues may be due to my lack of understanding of the proper installation steps for Airflow. Thank you for your response.

答案1

得分: 0

Airflow允许用户创建工作流,将任务以有向无环图(DAG)的方式连接在一起以创建工作流。Airflow可以连接多个数据源并通过电子邮件/通知发送有关作业状态的警报。

在Google Compute Engine上集成Airflow可以轻松完成,共有5个步骤:

步骤1: 创建Compute Engine实例以设置Google Airflow集成

  • 登录到Cloud控制台,在搜索框中搜索“创建实例”。
  • 单击“新建VM实例”,提供实例名称,并根据您的需求选择实例。例如:机器类型可以是e2-standard-2(2vCPU,8 GB内存)。镜像:Debian 1.0和50 GB硬盘
  • 单击“创建”以创建Compute Engine VM实例。

查看链接,以使用gcloud命令行创建Compute Engine VM实例。

步骤2: 安装Apache Airflow

  • 一旦实例创建完成,单击SSH以启动终端。

  • 一旦终端启动,通过运行以下命令升级机器并安装Python3。

    sudo apt update

    sudo apt -y upgrade

    sudo apt-get install wget

    sudo apt install -y python3-pip

您可以使用conda或miniconda创建Google Airflow集成的虚拟环境。请参考Vishal Agarwal编写的文档,运行创建虚拟环境和安装Airflow的命令。

步骤3: 设置Airflow

成功安装Airflow后,您需要设置和初始化Airflow。运行以下命令以成功创建Airflow的管理员用户。

airflow db init airflow users create -r Admin -u <username> -p <password> -e <email> -f <first name> -l <last name>

步骤4: 打开防火墙

Airflow仅在端口8080上运行。因此,在GCP控制台中,导航到VPC网络->单击防火墙并创建一个端口规则。在TCP下添加端口8080,然后单击创建规则

在计算实例上,添加防火墙规则以访问端口8080。

步骤5: 启动Airflow

一旦防火墙正确设置,通过以下命令启动Airflow Web服务器:

airflow webserver -p 8080

打开另一个终端并启动Airflow调度程序:

export AIRFLOW_HOME=/home/user/airflow_demo cd airflow_demo conda activate airflow_demo airflow db init airflow scheduler

一旦调度程序启动,从浏览器中打开Airflow控制台。转到https://<vm-IP-address>:8080

使用我们在步骤3中创建的用户名和密码登录。现在,您可以在Airflow UI中创建DAG。

英文:

Airflow allows users to create workflows as Directed Acyclic Graphs (DAGs) of tasks tied together to create workflows. Airflow can connect with multiple data sources and send alerts via email/notification about the Job’s status.

Integrating Airflow on Google Compute Engine is easily done in 5 steps:

Step-1 : create a Compute Engine Instance to set up Google Airflow Integration

  • Log in to Cloud Console, and on the search box, Search for “Create an Instance“.
  • Click on New VM Instance, provide the Instance’s name, and select the instances as per your requirement. For example : Machine type can be e2-standard-2 (2vCPU, 8 GB Memory). Image : Debian 1.0 and 50 GB HDD
  • Click Create to create Compute Engine VM Instance.

Check out the link to create a Compute Engine VM instance using the gcloud command line.

Step-2 : Install Apache Airflow

  • Once the Instance is created, click the SSH to start a terminal.

  • Once the terminal is up and running, upgrade the machine and install Python3 by running the following commands.

    sudo apt update

    sudo apt -y upgrade

    sudo apt-get install wget

    sudo apt install -y python3-pip

You can use either conda or miniconda to create a virtual environment for Google Airflow Integration. Refer to the doc written by Vishal Agarwal to run the commands for creating a virtual environment and installing Airflow.

Step-3 : Setting Up Airflow

After successful installation of Airflow you need to set-up and initialize the Airflow. Run the following command for successful creation of admin user of Airflow.

airflow db init airflow users create -r Admin -u <username> -p <password> -e <email> -f <first name> -l <last name>

Step- 4: Open Firewall

Airflow runs only on port 8080. So, In the GCP console, Navigate to the VPC Network -> Click on Firewall and create a port rule. Add port 8080 under TCP and click Create Rule in the Port rule.

On the Compute Instance, add the Firewall rule to access port 8080.

Step-5: Start Airflow

Once the Firewall is set up correctly. Start the Airflow Webserver by the following command:

airflow webserver -p 8080

Open another terminal and start the Airflow Scheduler:

export AIRFLOW_HOME=/home/user/airflow_demo
cd airflow_demo
conda activate airflow_demo
airflow db init
airflow scheduler

Once the scheduler is started. Open Airflow console from browser. Go to https://<vm-IP-address>:8080.

Give the username and password we created in Step 3. Now you can create DAGs in Airflow UI.

huangapple
  • 本文由 发表于 2023年6月13日 09:53:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76461263.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定