英文:
install airflow on compute engine
问题
我想问一下,你们中有没有人在Compute Engine上安装Airflow的经验。我一直在Google上搜索指南并向ChatGPT提问,但迄今为止都没有成功。我想知道在Compute Engine上安装Airflow的正确命令顺序。我相信我的安装问题可能是因为我不了解Airflow的正确安装步骤。谢谢你的回应。
英文:
I would like to ask if any of you have experience installing Airflow on Compute Engine. I have been searching for instructions on Google and asking ChatGPT, but I have not been successful so far. I would like to know the correct sequence of commands for installing Airflow on Compute Engine. I believe my installation issues may be due to my lack of understanding of the proper installation steps for Airflow. Thank you for your response.
答案1
得分: 0
Airflow允许用户创建工作流,将任务以有向无环图(DAG)的方式连接在一起以创建工作流。Airflow可以连接多个数据源并通过电子邮件/通知发送有关作业状态的警报。
在Google Compute Engine上集成Airflow可以轻松完成,共有5个步骤:
步骤1: 创建Compute Engine实例以设置Google Airflow集成
- 登录到Cloud控制台,在搜索框中搜索“创建实例”。
- 单击“新建VM实例”,提供实例名称,并根据您的需求选择实例。例如:机器类型可以是e2-standard-2(2vCPU,8 GB内存)。镜像:Debian 1.0和50 GB硬盘
- 单击“创建”以创建Compute Engine VM实例。
查看链接,以使用gcloud命令行创建Compute Engine VM实例。
步骤2: 安装Apache Airflow
-
一旦实例创建完成,单击SSH以启动终端。
-
一旦终端启动,通过运行以下命令升级机器并安装Python3。
sudo apt update
sudo apt -y upgrade
sudo apt-get install wget
sudo apt install -y python3-pip
您可以使用conda或miniconda创建Google Airflow集成的虚拟环境。请参考Vishal Agarwal编写的文档,运行创建虚拟环境和安装Airflow的命令。
步骤3: 设置Airflow
成功安装Airflow后,您需要设置和初始化Airflow。运行以下命令以成功创建Airflow的管理员用户。
airflow db init airflow users create -r Admin -u <username> -p <password> -e <email> -f <first name> -l <last name>
步骤4: 打开防火墙
Airflow仅在端口8080上运行。因此,在GCP控制台中,导航到VPC网络->单击防火墙并创建一个端口规则。在TCP下添加端口8080,然后单击创建规则。
在计算实例上,添加防火墙规则以访问端口8080。
步骤5: 启动Airflow
一旦防火墙正确设置,通过以下命令启动Airflow Web服务器:
airflow webserver -p 8080
打开另一个终端并启动Airflow调度程序:
export AIRFLOW_HOME=/home/user/airflow_demo cd airflow_demo conda activate airflow_demo airflow db init airflow scheduler
一旦调度程序启动,从浏览器中打开Airflow控制台。转到https://<vm-IP-address>:8080
。
使用我们在步骤3中创建的用户名和密码登录。现在,您可以在Airflow UI中创建DAG。
英文:
Airflow allows users to create workflows as Directed Acyclic Graphs (DAGs) of tasks tied together to create workflows. Airflow can connect with multiple data sources and send alerts via email/notification about the Job’s status.
Integrating Airflow on Google Compute Engine is easily done in 5 steps:
Step-1 : create a Compute Engine Instance to set up Google Airflow Integration
- Log in to Cloud Console, and on the search box, Search for “Create an Instance“.
- Click on New VM Instance, provide the Instance’s name, and select the instances as per your requirement. For example : Machine type can be e2-standard-2 (2vCPU, 8 GB Memory). Image : Debian 1.0 and 50 GB HDD
- Click Create to create Compute Engine VM Instance.
Check out the link to create a Compute Engine VM instance using the gcloud command line.
Step-2 : Install Apache Airflow
-
Once the Instance is created, click the SSH to start a terminal.
-
Once the terminal is up and running, upgrade the machine and install Python3 by running the following commands.
sudo apt update
sudo apt -y upgrade
sudo apt-get install wget
sudo apt install -y python3-pip
You can use either conda or miniconda to create a virtual environment for Google Airflow Integration. Refer to the doc written by Vishal Agarwal to run the commands for creating a virtual environment and installing Airflow.
Step-3 : Setting Up Airflow
After successful installation of Airflow you need to set-up and initialize the Airflow. Run the following command for successful creation of admin user of Airflow.
airflow db init airflow users create -r Admin -u <username> -p <password> -e <email> -f <first name> -l <last name>
Step- 4: Open Firewall
Airflow runs only on port 8080. So, In the GCP console, Navigate to the VPC Network -> Click on Firewall and create a port rule. Add port 8080 under TCP and click Create Rule in the Port rule.
On the Compute Instance, add the Firewall rule to access port 8080.
Step-5: Start Airflow
Once the Firewall is set up correctly. Start the Airflow Webserver by the following command:
airflow webserver -p 8080
Open another terminal and start the Airflow Scheduler:
export AIRFLOW_HOME=/home/user/airflow_demo
cd airflow_demo
conda activate airflow_demo
airflow db init
airflow scheduler
Once the scheduler is started. Open Airflow console from browser. Go to https://<vm-IP-address>:8080.
Give the username and password we created in Step 3. Now you can create DAGs in Airflow UI.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论