Distributed Proving
Generating a Venus proof can be computationally intensive, especially for large programs. The distributed proving system splits the workload across multiple machines, reducing wall-clock time by parallelizing the work.
This page covers how to set up and run a distributed Venus proving cluster, from launching a coordinator to connecting workers and submitting proof requests.
How It Works
A distributed Venus cluster consists of two roles:
- A Coordinator that receives proof requests and orchestrates the work.
- One or more Workers that execute the actual proof computation.
When a proof request is submitted, the cluster proceeds in three phases:
- Partial Contributions -- the coordinator assigns segments of the work to available workers based on their compute capacity. Each worker computes its partial challenges independently.
- Prove -- workers compute the global challenge and generate their respective partial proofs.
- Aggregation -- the first worker to finish is selected as the aggregator. It collects all partial proofs and produces the final proof.
The coordinator returns the final proof to the client once aggregation completes. Workers report their compute capacity at registration; the coordinator selects them sequentially from the available pool until the requested capacity is met. While assigned to a job, a worker is marked as busy and will not receive new tasks.
Getting Started
Building
From the Venus repository root, build the coordinator and worker binaries:
Running Locally
1. Start the coordinator:
2. Start a worker (in a separate terminal):
3. Submit a proof request (in a separate terminal):
cargo run --release --bin zisk-coordinator prove --inputs-uri <input-filename> --compute-capacity 10
The --compute-capacity flag specifies how many compute units the proof requires. The coordinator assigns workers until this capacity is covered.
Docker Deployment
For multi-machine setups, Docker simplifies deployment:
# Build the image (CPU-only)
docker build -t venus-distributed:latest -f distributed/Dockerfile .
# For GPU support
docker build --build-arg GPU=true -t venus-distributed:gpu -f distributed/Dockerfile .
# Create a network for container DNS resolution
docker network create venus-net || true
Start the coordinator:
LOGS_DIR="<logs-folder>"
docker run -d --rm --name venus-coordinator \
--network venus-net \
-v "$LOGS_DIR:/var/log/distributed" \
-e RUST_LOG=info \
venus-distributed:latest \
zisk-coordinator --config /app/config/coordinator/dev.toml
Start a worker:
LOGS_DIR="<logs-folder>"
PROVING_KEY_DIR="<provingKey-folder>"
ELF_DIR="<elf-folder>"
INPUTS_DIR="<inputs-folder>"
docker run -d --rm --name venus-worker-1 \
--network venus-net --shm-size=20g \
-v "$LOGS_DIR:/var/log/distributed" \
-v "$HOME/.zisk/cache:/app/.zisk/cache:ro" \
-v "$PROVING_KEY_DIR:/app/proving-keys:ro" \
-v "$ELF_DIR:/app/elf:ro" \
-v "$INPUTS_DIR:/app/inputs:ro" \
-e RUST_LOG=info \
venus-distributed:latest zisk-worker --coordinator-url http://venus-coordinator:50051 \
--elf /app/elf/zec.elf --proving-key /app/proving-keys --inputs-folder /app/inputs
Submit a proof:
docker exec -it venus-coordinator \
zisk-coordinator prove --inputs-uri <input-filename> --compute-capacity 10
Note: Use the filename only when submitting proofs, not the full path. Workers resolve files relative to their
--inputs-folder.
Container paths reference:
| Path | Purpose |
|---|---|
/app/config/{coordinator,worker}/ |
Configuration files |
/app/bin/ |
Binaries |
/app/.zisk/cache/ |
Cache (mount from host $HOME/.zisk/cache) |
/var/log/distributed/ |
Log files |
Coordinator
The coordinator manages the distributed proof generation process. It receives proof requests from clients and assigns work to available workers.
To start a coordinator with default settings:
Coordinator Configuration
The coordinator can be configured via either a TOML configuration file or command-line arguments.
If no configuration file is explicitly provided, the system falls back to the ZISK_COORDINATOR_CONFIG_PATH environment variable. If neither is set, built-in defaults are used.
# Specify the configuration file path on the command line
cargo run --release --bin zisk-coordinator -- --config /path/to/my-config.toml
# Or via environment variable
export ZISK_COORDINATOR_CONFIG_PATH="/path/to/my-config.toml"
cargo run --release --bin zisk-coordinator
| TOML Key | CLI Argument | Environment Variable | Type | Default | Description |
|---|---|---|---|---|---|
service.name |
- | - | String | Venus Distributed Coordinator | Service name |
service.environment |
- | - | String | development | Service environment (development, staging, production) |
server.host |
- | - | String | 0.0.0.0 | Server host |
server.port |
--port |
- | Number | 50051 | Server port |
server.proofs_dir |
--proofs-dir |
- | String | proofs | Directory to save generated proofs (conflicts with --no-save-proofs) |
| - | --no-save-proofs |
- | Boolean | false | Disable saving proofs (conflicts with --proofs-dir) |
| - | -c, --compressed-proofs |
- | Boolean | false | Generate compressed proofs |
server.shutdown_timeout_seconds |
- | - | Number | 30 | Graceful shutdown timeout in seconds |
logging.level |
- | RUST_LOG | String | debug | Logging level (error, warn, info, debug, trace) |
logging.format |
- | - | String | pretty | Logging format (pretty, json, compact) |
logging.file_path |
- | - | String | - | Optional. Log file path (enables file logging) |
coordinator.max_workers_per_job |
- | - | Number | 10 | Maximum workers per proof job |
coordinator.max_total_workers |
- | - | Number | 1000 | Maximum total registered workers |
coordinator.phase1_timeout_seconds |
- | - | Number | 300 | Phase 1 timeout in seconds |
coordinator.phase2_timeout_seconds |
- | - | Number | 600 | Phase 2 timeout in seconds |
coordinator.webhook_url |
--webhook-url |
- | String | - | Optional. Webhook URL to notify on job completion |
Configuration File Examples
Development:
[service]
name = "Venus Distributed Coordinator"
environment = "development"
[logging]
level = "debug"
format = "pretty"
Production:
[service]
name = "Venus Distributed Coordinator"
environment = "production"
[server]
host = "0.0.0.0"
port = 50051
proofs_dir = "proofs"
[logging]
level = "info"
format = "json"
file_path = "/var/log/distributed/coordinator.log"
[coordinator]
max_workers_per_job = 20
max_total_workers = 5000
phase1_timeout_seconds = 600
phase2_timeout_seconds = 1200
webhook_url = "http://webhook.example.com/notify?job_id={$job_id}"
Webhook URL
The coordinator can notify an external service when a job finishes by sending a request to a configured webhook URL.
The placeholder {$job_id} can be included in the URL and will be replaced with the finished job's ID.
If no placeholder is provided, the coordinator automatically appends /{job_id} to the end of the URL.
All webhook notifications are sent as JSON POST requests with the following structure:
{
"job_id": "job_12345",
"success": true,
"duration_ms": 45000,
"proof": [],
"timestamp": "2025-10-03T14:30:00Z",
"error": null
}
Field Descriptions
| Field | Type | Description |
|---|---|---|
job_id |
string |
Unique identifier for the proof generation job |
success |
boolean |
true on success, false on failure |
duration_ms |
number |
Total execution time in milliseconds |
proof |
array<u64> | null |
Final proof data (only on success) |
timestamp |
string |
ISO 8601 timestamp |
error |
object | null |
Error details (only on failure) |
Error Object Structure
{
"code": "WORKER_FAILURE",
"message": "Worker node-003 failed during proof generation: Out of memory"
}
Webhook Implementation Guidelines
HTTP Requirements:
- Method: POST
- Content-Type:
application/json - Timeout: 10 seconds (configurable)
- Retry: currently no automatic retries (implement idempotency)
Recommended Response:
- Success: HTTP 200-299 status code
- Body: any valid response (ignored by coordinator)
If your webhook endpoint is unavailable or returns an error, the coordinator logs the failure but continues operation. No automatic retries are performed -- consider implementing your own retry mechanism or message queue.
Command-Line Arguments
# Show help
cargo run --release --bin zisk-coordinator -- --help
# Run coordinator with custom port
cargo run --release --bin zisk-coordinator -- --port 50051
# Run with specific configuration
cargo run --release --bin zisk-coordinator -- --config production.toml
# Run with webhook URL
cargo run --release --bin zisk-coordinator -- --webhook-url http://webhook.example.com/notify --port 50051
Worker
The worker executes proof generation tasks assigned by the coordinator. It registers with the coordinator, reports its compute capacity, and waits for tasks.
To start a worker with default settings:
Worker Configuration
Same conventions as the coordinator: TOML config file, the ZISK_WORKER_CONFIG_PATH environment variable, or CLI arguments.
cargo run --release --bin zisk-worker -- --config /path/to/my-config.toml
export ZISK_WORKER_CONFIG_PATH="/path/to/my-config.toml"
cargo run --release --bin zisk-worker
Input Files Handling
Workers need to know where to find input files. The --inputs-folder parameter specifies the base directory:
- Default: current working directory (
.). - Usage: when the coordinator sends a
provecommand with an input filename, the worker combines--inputs-folder+filenameto locate the file. - Benefits: input files can be organized in a dedicated directory, separate from the worker executable.
# Worker with inputs in a specific folder
cargo run --release --bin zisk-worker -- --elf program.elf --inputs-folder /data/inputs/
# Coordinator requests proof for "input.bin" -> worker looks for "/data/inputs/input.bin"
cargo run --release --bin zisk-coordinator -- prove --inputs-uri input.bin --compute-capacity 10
| TOML Key | CLI Argument | Environment Variable | Type | Default | Description |
|---|---|---|---|---|---|
worker.worker_id |
--worker-id |
- | String | Auto-generated UUID | Unique worker identifier |
worker.compute_capacity.compute_units |
--compute-capacity |
- | Number | 10 | Worker compute capacity (compute units) |
worker.environment |
- | - | String | development | Service environment |
worker.inputs_folder |
--inputs-folder |
- | String | . | Path to folder containing input files |
coordinator.url |
--coordinator-url |
- | String | http://127.0.0.1:50051 | Coordinator server URL |
connection.reconnect_interval_seconds |
- | - | Number | 5 | Reconnection interval in seconds |
connection.heartbeat_timeout_seconds |
- | - | Number | 30 | Heartbeat timeout in seconds |
logging.level |
- | RUST_LOG | String | debug | Logging level |
logging.format |
- | - | String | pretty | Logging format |
logging.file_path |
- | - | String | - | Optional. Log file path |
| - | --proving-key |
- | String | ~/.zisk/provingKey | Path to setup folder |
| - | --elf |
- | String | - | Path to ELF file |
| - | --asm |
- | String | ~/.zisk/cache | Path to ASM file (mutually exclusive with --emulator) |
| - | --emulator |
- | Boolean | false | Use prebuilt emulator (mutually exclusive with --asm) |
| - | --asm-port |
- | Number | 23115 | Base port for assembly microservices |
| - | --shared-tables |
- | Boolean | false | Share tables when running in a cluster |
| - | -v, -vv, -vvv |
- | Number | 0 | Verbosity level (0=error, 1=warn, 2=info, 3=debug, 4=trace) |
| - | -d, --debug |
- | String | - | Enable debug mode with optional component filter |
| - | --verify-constraints |
- | Boolean | false | Verify constraints |
| - | --unlock-mapped-memory |
- | Boolean | false | Unlock memory map for the ROM file (mutually exclusive with --emulator) |
| - | --hints |
- | Boolean | false | Enable precompile hints processing |
| - | -m, --minimal-memory |
- | Boolean | false | Use minimal memory mode |
| - | -r, --rma |
- | Boolean | false | Enable RMA mode |
| - | -z, --preallocate |
- | Boolean | false | GPU preallocation flag |
| - | -t, --max-streams |
- | Number | - | Maximum number of GPU streams |
| - | -n, --number-threads-witness |
- | Number | - | Threads for witness computation |
| - | -x, --max-witness-stored |
- | Number | - | Maximum number of witnesses to store in memory |
Configuration File Examples
Development:
[worker]
compute_capacity.compute_units = 10
environment = "development"
[logging]
level = "debug"
format = "pretty"
Production:
[worker]
worker_id = "my-worker-001"
compute_capacity.compute_units = 10
environment = "production"
inputs_folder = "/app/inputs"
[coordinator]
url = "http://127.0.0.1:50051"
[connection]
reconnect_interval_seconds = 5
heartbeat_timeout_seconds = 30
[logging]
level = "info"
format = "pretty"
file_path = "/var/log/distributed/worker-001.log"
Launching a Proof
Use the prove subcommand of zisk-coordinator to send an RPC request to a running coordinator:
cargo run --release --bin zisk-coordinator -- prove --inputs-uri <input_filename> --compute-capacity 10
The --compute-capacity flag indicates the total compute units required to generate a proof. The coordinator assigns one or more workers to meet this capacity, distributing the workload if multiple workers are needed. Requests exceeding the combined capacity of available workers will not be processed.
prove Subcommand Arguments
| CLI Argument | Short | Type | Default | Description |
|---|---|---|---|---|
--inputs-uri |
- | String | - | Path to the input file for proof generation |
--compute-capacity |
-c |
Number | required | Total compute units required for the proof |
--coordinator-url |
- | String | http://127.0.0.1:50051 | URL of the coordinator |
--data-id |
- | String | Auto (filename or UUID) | Custom identifier for the proof job |
--hints-uri |
- | String | - | Path/URI to the precompile hints source |
--stream-hints |
- | Boolean | false | Stream hints from coordinator to workers via gRPC (see Hints Stream) |
--direct-inputs |
-x |
Boolean | false | Send input data inline via gRPC instead of as a file path |
--minimal-compute-capacity |
-m |
Number | Same as --compute-capacity |
Minimum acceptable capacity (allows partial allocation) |
--simulated-node |
- | Number | - | Simulated node ID (for testing) |
Input and Hints Modes
Input modes (controlled by --inputs-uri and --direct-inputs):
- Path mode (default): the coordinator sends the input file path to workers. Workers must have access to the file at the specified path.
- Data mode (
--direct-inputs): the coordinator reads the input file and sends its contents inline via gRPC. Workers do not need local access to the file.
Hints modes (controlled by --hints-uri and --stream-hints):
- Path mode (default): the coordinator sends the hints URI to workers. Each worker loads hints from the specified path independently.
- Streaming mode (
--stream-hints): the coordinator reads hints from the URI and broadcasts them to all workers in real-time via gRPC. See the Hints Stream page for details.
# Basic proof with file-path inputs
zisk-coordinator prove --inputs-uri /data/inputs/my_input.bin --compute-capacity 10
# Send input data inline (workers don't need local file access)
zisk-coordinator prove --inputs-uri /data/inputs/my_input.bin -x --compute-capacity 10
# With precompile hints in path mode (workers load hints locally)
zisk-coordinator prove --inputs-uri input.bin --hints-uri /data/hints/hints.bin --compute-capacity 10
# With precompile hints in streaming mode (coordinator broadcasts to workers)
zisk-coordinator prove --inputs-uri input.bin --hints-uri unix:///tmp/hints.sock --stream-hints --compute-capacity 10
Administrative Operations
Health Checks and Monitoring
The coordinator exposes administrative gRPC endpoints for monitoring:
# Basic health check
grpcurl -plaintext 127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/HealthCheck
# System status
grpcurl -plaintext 127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/SystemStatus
# List active jobs
grpcurl -plaintext -d '{"active_only": true}' \
127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/JobsList
# List connected workers
grpcurl -plaintext -d '{"available_only": true}' \
127.0.0.1:50051 zisk.distributed.api.v1.ZiskDistributedApi/WorkersList
Troubleshooting
Worker can't connect to coordinator:
- Verify the coordinator is running and accessible on the specified port.
- Check firewall settings if coordinator and worker are on different machines.
- Ensure correct URL format:
http://host:port(nothttps://for the default setup).
Configuration not loading:
- Verify TOML syntax with a TOML validator.
- Check file permissions on configuration files.
- Use CLI overrides to test specific values.
Worker not receiving tasks:
- Check worker registration in coordinator logs.
- Verify compute capacity is appropriate for the available tasks.
- Ensure worker IDs are unique if running multiple workers.
- Confirm the coordinator has active jobs to distribute.
Input file not found errors:
- Verify the input file exists in the worker's
--inputs-folderdirectory. - Check file permissions; the worker needs read access.
- Ensure you are using the filename only (not full path) when launching proofs.
- Confirm
--inputs-folderpath is correct and accessible.
Port conflicts:
- Use the
--portflag or update the configuration file. - Check for other services using the same ports.
Debug Mode
cargo run --release --bin zisk-coordinator -- --config debug-coordinator.toml
cargo run --release --bin zisk-worker -- --config debug-worker.toml
Where debug-*.toml contains:
Log Files
When file logging is enabled, logs are written to the paths specified in the configuration files. Ensure the application has write permissions: