Skip to content

Distributed Proving

Generating a Venus proof can be computationally intensive, especially for large programs. The distributed proving system splits the workload across multiple machines, reducing wall-clock time by parallelizing the work.

This page covers how to set up and run a distributed Venus proving cluster, from launching a coordinator to connecting workers and submitting proof requests.

How It Works

A distributed Venus cluster consists of two roles:

  • A Coordinator that receives proof requests and orchestrates the work.
  • One or more Workers that execute the actual proof computation.

When a proof request is submitted, the cluster proceeds in three phases:

  1. Partial Contributions -- the coordinator assigns segments of the work to available workers based on their compute capacity. Each worker computes its partial challenges independently.
  2. Prove -- workers compute the global challenge and generate their respective partial proofs.
  3. Aggregation -- the first worker to finish is selected as the aggregator. It collects all partial proofs and produces the final proof.

The coordinator returns the final proof to the client once aggregation completes. Workers report their compute capacity at registration; the coordinator selects them sequentially from the available pool until the requested capacity is met. While assigned to a job, a worker is marked as busy and will not receive new tasks.

Getting Started

Building

From the Venus repository root, build the coordinator and worker binaries:

cargo build --release --bin zisk-coordinator --bin zisk-worker

Running Locally

1. Start the coordinator:

cargo run --release --bin zisk-coordinator

2. Start a worker (in a separate terminal):

cargo run --release --bin zisk-worker -- --elf <elf-file-path> --inputs-folder <inputs-folder>

3. Submit a proof request (in a separate terminal):

cargo run --release --bin zisk-coordinator prove --inputs-uri <input-filename> --compute-capacity 10

The --compute-capacity flag specifies how many compute units the proof requires. The coordinator assigns workers until this capacity is covered.

Docker Deployment

For multi-machine setups, Docker simplifies deployment:

# Build the image (CPU-only)
docker build -t venus-distributed:latest -f distributed/Dockerfile .

# For GPU support
docker build --build-arg GPU=true -t venus-distributed:gpu -f distributed/Dockerfile .

# Create a network for container DNS resolution
docker network create venus-net || true

Start the coordinator:

LOGS_DIR="<logs-folder>"
docker run -d --rm --name venus-coordinator \
  --network venus-net \
  -v "$LOGS_DIR:/var/log/distributed" \
  -e RUST_LOG=info \
  venus-distributed:latest \
  zisk-coordinator --config /app/config/coordinator/dev.toml

Start a worker:

LOGS_DIR="<logs-folder>"
PROVING_KEY_DIR="<provingKey-folder>"
ELF_DIR="<elf-folder>"
INPUTS_DIR="<inputs-folder>"
docker run -d --rm --name venus-worker-1 \
  --network venus-net --shm-size=20g \
  -v "$LOGS_DIR:/var/log/distributed" \
  -v "$HOME/.zisk/cache:/app/.zisk/cache:ro" \
  -v "$PROVING_KEY_DIR:/app/proving-keys:ro" \
  -v "$ELF_DIR:/app/elf:ro" \
  -v "$INPUTS_DIR:/app/inputs:ro" \
  -e RUST_LOG=info \
  venus-distributed:latest zisk-worker --coordinator-url http://venus-coordinator:50051 \
    --elf /app/elf/zec.elf --proving-key /app/proving-keys --inputs-folder /app/inputs

Submit a proof:

docker exec -it venus-coordinator \
  zisk-coordinator prove --inputs-uri <input-filename> --compute-capacity 10

Note: Use the filename only when submitting proofs, not the full path. Workers resolve files relative to their --inputs-folder.

Container paths reference:

Path Purpose
/app/config/{coordinator,worker}/ Configuration files
/app/bin/ Binaries
/app/.zisk/cache/ Cache (mount from host $HOME/.zisk/cache)
/var/log/distributed/ Log files

Coordinator

The coordinator manages the distributed proof generation process. It receives proof requests from clients and assigns work to available workers.

To start a coordinator with default settings:

cargo run --release --bin zisk-coordinator

Coordinator Configuration

The coordinator can be configured via either a TOML configuration file or command-line arguments. If no configuration file is explicitly provided, the system falls back to the ZISK_COORDINATOR_CONFIG_PATH environment variable. If neither is set, built-in defaults are used.

# Specify the configuration file path on the command line
cargo run --release --bin zisk-coordinator -- --config /path/to/my-config.toml

# Or via environment variable
export ZISK_COORDINATOR_CONFIG_PATH="/path/to/my-config.toml"
cargo run --release --bin zisk-coordinator
TOML Key CLI Argument Environment Variable Type Default Description
service.name - - String Venus Distributed Coordinator Service name
service.environment - - String development Service environment (development, staging, production)
server.host - - String 0.0.0.0 Server host
server.port --port - Number 50051 Server port
server.proofs_dir --proofs-dir - String proofs Directory to save generated proofs (conflicts with --no-save-proofs)
- --no-save-proofs - Boolean false Disable saving proofs (conflicts with --proofs-dir)
- -c, --compressed-proofs - Boolean false Generate compressed proofs
server.shutdown_timeout_seconds - - Number 30 Graceful shutdown timeout in seconds
logging.level - RUST_LOG String debug Logging level (error, warn, info, debug, trace)
logging.format - - String pretty Logging format (pretty, json, compact)
logging.file_path - - String - Optional. Log file path (enables file logging)
coordinator.max_workers_per_job - - Number 10 Maximum workers per proof job
coordinator.max_total_workers - - Number 1000 Maximum total registered workers
coordinator.phase1_timeout_seconds - - Number 300 Phase 1 timeout in seconds
coordinator.phase2_timeout_seconds - - Number 600 Phase 2 timeout in seconds
coordinator.webhook_url --webhook-url - String - Optional. Webhook URL to notify on job completion

Configuration File Examples

Development:

[service]
name = "Venus Distributed Coordinator"
environment = "development"

[logging]
level = "debug"
format = "pretty"

Production:

[service]
name = "Venus Distributed Coordinator"
environment = "production"

[server]
host = "0.0.0.0"
port = 50051
proofs_dir = "proofs"

[logging]
level = "info"
format = "json"
file_path = "/var/log/distributed/coordinator.log"

[coordinator]
max_workers_per_job = 20
max_total_workers = 5000
phase1_timeout_seconds = 600
phase2_timeout_seconds = 1200
webhook_url = "http://webhook.example.com/notify?job_id={$job_id}"

Webhook URL

The coordinator can notify an external service when a job finishes by sending a request to a configured webhook URL. The placeholder {$job_id} can be included in the URL and will be replaced with the finished job's ID. If no placeholder is provided, the coordinator automatically appends /{job_id} to the end of the URL.

All webhook notifications are sent as JSON POST requests with the following structure:

{
  "job_id": "job_12345",
  "success": true,
  "duration_ms": 45000,
  "proof": [],
  "timestamp": "2025-10-03T14:30:00Z",
  "error": null
}
Field Descriptions
Field Type Description
job_id string Unique identifier for the proof generation job
success boolean true on success, false on failure
duration_ms number Total execution time in milliseconds
proof array<u64> | null Final proof data (only on success)
timestamp string ISO 8601 timestamp
error object | null Error details (only on failure)
Error Object Structure
{
  "code": "WORKER_FAILURE",
  "message": "Worker node-003 failed during proof generation: Out of memory"
}

Webhook Implementation Guidelines

HTTP Requirements:

  • Method: POST
  • Content-Type: application/json
  • Timeout: 10 seconds (configurable)
  • Retry: currently no automatic retries (implement idempotency)

Recommended Response:

  • Success: HTTP 200-299 status code
  • Body: any valid response (ignored by coordinator)
HTTP/1.1 200 OK
Content-Type: application/json

{"received": true, "job_id": "job_abc123"}

If your webhook endpoint is unavailable or returns an error, the coordinator logs the failure but continues operation. No automatic retries are performed -- consider implementing your own retry mechanism or message queue.

Command-Line Arguments

# Show help
cargo run --release --bin zisk-coordinator -- --help

# Run coordinator with custom port
cargo run --release --bin zisk-coordinator -- --port 50051

# Run with specific configuration
cargo run --release --bin zisk-coordinator -- --config production.toml

# Run with webhook URL
cargo run --release --bin zisk-coordinator -- --webhook-url http://webhook.example.com/notify --port 50051

Worker

The worker executes proof generation tasks assigned by the coordinator. It registers with the coordinator, reports its compute capacity, and waits for tasks.

To start a worker with default settings:

cargo run --release --bin zisk-worker -- --elf <elf-file-path> --inputs-folder <inputs-folder>

Worker Configuration

Same conventions as the coordinator: TOML config file, the ZISK_WORKER_CONFIG_PATH environment variable, or CLI arguments.

cargo run --release --bin zisk-worker -- --config /path/to/my-config.toml

export ZISK_WORKER_CONFIG_PATH="/path/to/my-config.toml"
cargo run --release --bin zisk-worker

Input Files Handling

Workers need to know where to find input files. The --inputs-folder parameter specifies the base directory:

  • Default: current working directory (.).
  • Usage: when the coordinator sends a prove command with an input filename, the worker combines --inputs-folder + filename to locate the file.
  • Benefits: input files can be organized in a dedicated directory, separate from the worker executable.
# Worker with inputs in a specific folder
cargo run --release --bin zisk-worker -- --elf program.elf --inputs-folder /data/inputs/

# Coordinator requests proof for "input.bin" -> worker looks for "/data/inputs/input.bin"
cargo run --release --bin zisk-coordinator -- prove --inputs-uri input.bin --compute-capacity 10
TOML Key CLI Argument Environment Variable Type Default Description
worker.worker_id --worker-id - String Auto-generated UUID Unique worker identifier
worker.compute_capacity.compute_units --compute-capacity - Number 10 Worker compute capacity (compute units)
worker.environment - - String development Service environment
worker.inputs_folder --inputs-folder - String . Path to folder containing input files
coordinator.url --coordinator-url - String http://127.0.0.1:50051 Coordinator server URL
connection.reconnect_interval_seconds - - Number 5 Reconnection interval in seconds
connection.heartbeat_timeout_seconds - - Number 30 Heartbeat timeout in seconds
logging.level - RUST_LOG String debug Logging level
logging.format - - String pretty Logging format
logging.file_path - - String - Optional. Log file path
- --proving-key - String ~/.zisk/provingKey Path to setup folder
- --elf - String - Path to ELF file
- --asm - String ~/.zisk/cache Path to ASM file (mutually exclusive with --emulator)
- --emulator - Boolean false Use prebuilt emulator (mutually exclusive with --asm)
- --asm-port - Number 23115 Base port for assembly microservices
- --shared-tables - Boolean false Share tables when running in a cluster
- -v, -vv, -vvv - Number 0 Verbosity level (0=error, 1=warn, 2=info, 3=debug, 4=trace)
- -d, --debug - String - Enable debug mode with optional component filter
- --verify-constraints - Boolean false Verify constraints
- --unlock-mapped-memory - Boolean false Unlock memory map for the ROM file (mutually exclusive with --emulator)
- --hints - Boolean false Enable precompile hints processing
- -m, --minimal-memory - Boolean false Use minimal memory mode
- -r, --rma - Boolean false Enable RMA mode
- -z, --preallocate - Boolean false GPU preallocation flag
- -t, --max-streams - Number - Maximum number of GPU streams
- -n, --number-threads-witness - Number - Threads for witness computation
- -x, --max-witness-stored - Number - Maximum number of witnesses to store in memory

Configuration File Examples

Development:

[worker]
compute_capacity.compute_units = 10
environment = "development"

[logging]
level = "debug"
format = "pretty"

Production:

[worker]
worker_id = "my-worker-001"
compute_capacity.compute_units = 10
environment = "production"
inputs_folder = "/app/inputs"

[coordinator]
url = "http://127.0.0.1:50051"

[connection]
reconnect_interval_seconds = 5
heartbeat_timeout_seconds = 30

[logging]
level = "info"
format = "pretty"
file_path = "/var/log/distributed/worker-001.log"

Launching a Proof

Use the prove subcommand of zisk-coordinator to send an RPC request to a running coordinator:

cargo run --release --bin zisk-coordinator -- prove --inputs-uri <input_filename> --compute-capacity 10

The --compute-capacity flag indicates the total compute units required to generate a proof. The coordinator assigns one or more workers to meet this capacity, distributing the workload if multiple workers are needed. Requests exceeding the combined capacity of available workers will not be processed.

prove Subcommand Arguments

CLI Argument Short Type Default Description
--inputs-uri - String - Path to the input file for proof generation
--compute-capacity -c Number required Total compute units required for the proof
--coordinator-url - String http://127.0.0.1:50051 URL of the coordinator
--data-id - String Auto (filename or UUID) Custom identifier for the proof job
--hints-uri - String - Path/URI to the precompile hints source
--stream-hints - Boolean false Stream hints from coordinator to workers via gRPC (see Hints Stream)
--direct-inputs -x Boolean false Send input data inline via gRPC instead of as a file path
--minimal-compute-capacity -m Number Same as --compute-capacity Minimum acceptable capacity (allows partial allocation)
--simulated-node - Number - Simulated node ID (for testing)

Input and Hints Modes

Input modes (controlled by --inputs-uri and --direct-inputs):

  • Path mode (default): the coordinator sends the input file path to workers. Workers must have access to the file at the specified path.
  • Data mode (--direct-inputs): the coordinator reads the input file and sends its contents inline via gRPC. Workers do not need local access to the file.

Hints modes (controlled by --hints-uri and --stream-hints):

  • Path mode (default): the coordinator sends the hints URI to workers. Each worker loads hints from the specified path independently.
  • Streaming mode (--stream-hints): the coordinator reads hints from the URI and broadcasts them to all workers in real-time via gRPC. See the Hints Stream page for details.
# Basic proof with file-path inputs
zisk-coordinator prove --inputs-uri /data/inputs/my_input.bin --compute-capacity 10

# Send input data inline (workers don't need local file access)
zisk-coordinator prove --inputs-uri /data/inputs/my_input.bin -x --compute-capacity 10

# With precompile hints in path mode (workers load hints locally)
zisk-coordinator prove --inputs-uri input.bin --hints-uri /data/hints/hints.bin --compute-capacity 10

# With precompile hints in streaming mode (coordinator broadcasts to workers)
zisk-coordinator prove --inputs-uri input.bin --hints-uri unix:///tmp/hints.sock --stream-hints --compute-capacity 10

Administrative Operations

Health Checks and Monitoring

The coordinator exposes administrative gRPC endpoints for monitoring:

# Basic health check
grpcurl -plaintext 127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/HealthCheck

# System status
grpcurl -plaintext 127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/SystemStatus

# List active jobs
grpcurl -plaintext -d '{"active_only": true}' \
  127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/JobsList

# List connected workers
grpcurl -plaintext -d '{"available_only": true}' \
  127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/WorkersList

Troubleshooting

Worker can't connect to coordinator:

  • Verify the coordinator is running and accessible on the specified port.
  • Check firewall settings if coordinator and worker are on different machines.
  • Ensure correct URL format: http://host:port (not https:// for the default setup).

Configuration not loading:

  • Verify TOML syntax with a TOML validator.
  • Check file permissions on configuration files.
  • Use CLI overrides to test specific values.

Worker not receiving tasks:

  • Check worker registration in coordinator logs.
  • Verify compute capacity is appropriate for the available tasks.
  • Ensure worker IDs are unique if running multiple workers.
  • Confirm the coordinator has active jobs to distribute.

Input file not found errors:

  • Verify the input file exists in the worker's --inputs-folder directory.
  • Check file permissions; the worker needs read access.
  • Ensure you are using the filename only (not full path) when launching proofs.
  • Confirm --inputs-folder path is correct and accessible.

Port conflicts:

  • Use the --port flag or update the configuration file.
  • Check for other services using the same ports.

Debug Mode

cargo run --release --bin zisk-coordinator -- --config debug-coordinator.toml
cargo run --release --bin zisk-worker      -- --config debug-worker.toml

Where debug-*.toml contains:

[logging]
level = "debug"
format = "pretty"

Log Files

When file logging is enabled, logs are written to the paths specified in the configuration files. Ensure the application has write permissions:

[logging]
file_path = "/var/log/distributed/coordinator.log"