Skip to main content

Airflow installation guide for running locally on M1 Mac

· 5 min read
Ryukato
BackEnd Software Developer

Env

  • python version: 3.10.14

Install python 3.10.4

pyenv install 3.10.14
pyenv local 3.10.14

Create python venv

python3.10 -m venv .venv
source .venv/bin/activate

Note Check python path and version

which python
python --version

Installation

Set AIRFLOW_HOME

Create a directory for airflow home directory whose config file(e.g. airflow.cfg) and logs. After that, set the directory as AIRFLOW_HOME.

export AIRFLOW_HOME=[path of the directory]

Example

export AIRFLOW_HOME=$(pwd)/airflow-home

Setup constraint url

AIRFLOW_VERSION=2.9.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"

Install airflow

pip install "apache-airflow[postgres,celery]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

Note If there is any problem then install airflow without any plugin first, then install plugins.

pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
pip install "apache-airflow[postgres,celery]" --constraint "${CONSTRAINT_URL}"

Please remember that --constraint has to be in second installation

Setup airflow database (PostgreSQL)

CREATE USER airflow WITH PASSWORD 'airflow';
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;

ALTER DATABASE airflow OWNER TO airflow;
ALTER ROLE airflow SET client_encoding TO 'utf8';
ALTER ROLE airflow SET default_transaction_isolation TO 'read committed';
ALTER ROLE airflow SET timezone TO 'Asia/Seoul';

Init airflow

First check version of the installed airflow, and it will be 2.9.0

airflow version

Init airflow DB

airflow db init

After run db init, should check there are files and directories in $AIRFLOW_HOME.

  • airflow.cfg
  • webserver_config.py (optional, if it is not, it's fine)
  • logs

Create admin user

airflow users create --username admin --role Admin --firstname admin --lastname user --email admin@example.com

Update airflow config

Modify airflow.cfg

[core]
executor = LocalExecutor

[database]
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost:5432/airflow

Note After updating airflow.cfg, should run airflow db init again.

Run

run scheduler

airflow scheduler

run web-server

airflow webserver --port [port number]

ETC

NO sample dags

If you don't need sample dags, then update airflow.cfg like below.

[core]
load_samples = False

Uninstall

Uninstall airflow

pip uninstall apache-airflow

Reset airflow DB

airflow db reset

Uninstall packages

pip freeze | grep apache-airflow | xargs pip uninstall -y

Remove airflow home directory

rm -rf ~/airflow

Refs

Installation

Trouble shoot

WARNING: There was an error checking the latest version of pip.

certifi issue

pip install --upgrade certifi

Proxy issue

pip install --proxy=http://your.proxy:port --upgrade pip

운영 환경 설정 가이드 (추가)

1. 네트워크 환경 설정

공공 데이터 API 호출 시 M1/M2 환경에서 requests 라이브러리의 NO_PROXY 관련 이슈가 발생할 수 있습니다.
NO_PROXY 환경 변수를 반드시 설정해 주세요.

export NO_PROXY="*"

2. API Rate-Limit 대응

공공 데이터 포털 API는 보수적으로 10~20 TPS 수준으로 호출을 제한하는 것이 안전합니다.
이를 위해 Airflow Pool을 활용해 동시 실행 Task 수를 제한합니다.

예시:

# Airflow UI > Admin → Pools
# Pool Name: fetch_external_data_api
# Slots: 3 # TPS 제한에 맞춰 조정

DAG 예시 설정:

@task(
pool="fetch_external_data_api",
pool_slots=1,
retries=5,
retry_exponential_backoff=True,
retry_delay=timedelta(seconds=10)
)

3. MongoDB 연결 설정

MongoDB 연결은 장기적으로 Airflow Connections + MongoHook을 사용하는 것이 관리와 보안에 유리합니다.

  • Connection ID: mongo_default
  • URI 예시: mongodb://username:password@host:27017/dbname

설치:

pip install "apache-airflow[mongo]" --constraint "${CONSTRAINT_URL}"

예시:

from airflow.hooks.base import BaseHook
from pymongo import MongoClient

mongo_conn = BaseHook.get_connection("mongo_default")
client = MongoClient(mongo_conn.get_uri())

4. 타임존 설정

airflow.cfg에서 타임존을 Asia/Seoul로 변경하여 로컬 시간과 일치하도록 합니다.

[core]
default_timezone = Asia/Seoul

5. 장기 실행 / 좀비 태스크 방지

  • scheduler_zombie_task_threshold: 실행 환경과 API 응답 속도에 맞춰 조정 (기본값 300초 → 필요시 600~900초)
  • execution_timeout: 태스크 레벨에서 설정해 무한 대기 방지

예시:

@task(
execution_timeout=timedelta(minutes=1)
)