Crawlab
Bug 太多了, 不推荐使用
Crawlab 是一个网络爬虫的可视化管理平台, 由 Go 语言开发, 支持多种编程语言和爬虫框架
最简配置
crawlab/docker-compose.yml
services:
master:
image: crawlabteam/crawlab
container_name: crawlab_master
environment:
CRAWLAB_NODE_MASTER: "Y"
CRAWLAB_MONGO_HOST: "mongo"
ports:
- "8080:8080"
depends_on:
- mongo
mongo:
image: mongo
ports:
- "27017:27017"
打开浏览器并导航到 http://localhost:8080 并开始使用 Crawlab, 默认用户名和密码为 admin
/admin
多节点
主节点
crawlab/mnd/master/docker-compose.yml
services:
master:
image: crawlabteam/crawlab
container_name: crawlab_master
restart: always
environment:
CRAWLAB_NODE_MASTER: "Y" # Y: 主节点
CRAWLAB_MONGO_HOST: "host.docker.internal" # mongo host address
CRAWLAB_MONGO_PORT: "27017" # mongo port
CRAWLAB_MONGO_DB: "crawlab_master" # mongo database
CRAWLAB_MONGO_USERNAME: "root" # mongo username
CRAWLAB_MONGO_PASSWORD: "123456" # mongo password
volumes:
- crawlab-master-data:/root/.crawlab # 持久化 crawlab 元数据
- crawlab-master-data:/data # 持久化 crawlab 数据
- crawlab-master-log:/var/log/crawlab # 持久化 crawlab 任务日志
ports:
- 8080:8080 # 开放 api 端口
- 9666:9666 # 开放 grpc 端口
volumes:
crawlab-master-data:
crawlab-master-log:
- crawlab 的主节点连接外部的 MongoDB
- 因为在同一台宿主机, 所以可以用
host.docker.internal
地址 - 注意 MongoDB 的配置要正确, 否则连不上 MongoDB
- 使用持久化卷将 crawlab 的元数据和日志持久化
工作节点
crawlab/mnd/worker/docker-compose.yml
services:
worker:
image: crawlabteam/crawlab
container_name: crawlab_worker
restart: always
environment:
CRAWLAB_NODE_MASTER: "N" # N: 工作节点
CRAWLAB_GRPC_ADDRESS: "host.docker.internal:9666" # grpc address
CRAWLAB_FS_FILER_URL: "http://host.docker.internal:8080/api/filer" # seaweedfs api
volumes:
- crawlab-worker-data:/root/.crawlab # 持久化 crawlab 元数据
- crawlab-worker-data:/data # 持久化 crawlab 数据
volumes:
crawlab-worker-data: