Skip to content

Writing Programs

This page explains how to author Rust programs that execute inside the Venus zkVM, and how to drive them with Venus's cargo-zisk CLI.

Project Skeleton

The fastest way to scaffold a new guest is the SDK helper:

cargo-zisk sdk new sha_hasher
cd sha_hasher

This produces:

.
├── build.rs
├── Cargo.toml
├── .gitignore
├── guest
|   ├── src
|   |    └── main.rs
|   └── Cargo.toml
└── host
    ├── src
    |    └── main.rs
    ├── bin
    |    ├── compressed.rs
    |    ├── execute.rs
    |    ├── prove.rs
    |    ├── plonk.rs
    |    ├── verify-constraints.rs
    |    └── ziskemu.rs
    ├── build.rs
    └── Cargo.toml

The example program takes a number n as input and computes the SHA-256 hash n times.

Authoring a Guest

A Venus guest is a no_main Rust binary with a Venus-provided entrypoint macro and the ziskos runtime crate.

main.rs

// Compute SHA-256 hash `n` times sequentially.

#![no_main]
ziskos::entrypoint!(main);

use sha2::{Digest, Sha256};

fn main() {
    let n: u32 = ziskos::io::read();

    let mut hash = [0u8; 32];
    for _ in 0..n {
        let mut hasher = Sha256::new();
        hasher.update(hash);
        hash = hasher.finalize().into();
    }

    ziskos::io::commit(&hash);
}

Cargo.toml

[package]
name = "guest"
version = "0.1.0"
edition = "2021"

[dependencies]
sha2 = "0.10.8"
ziskos = { git = "https://github.com/cysic-labs/venus.git" }

Input / Output

Reading inputs:

let n: u32 = ziskos::io::read();
let my_data: MyStruct = ziskos::io::read();   // any `Deserialize` type

Committing public outputs:

let hash: [u8; 32] = compute_hash();
ziskos::io::commit(&hash);                    // any `Serialize` type

Committed values become public outputs that anyone verifying the proof can inspect.

Building

You can run the guest natively (just like any Rust program) for development:

cargo build --release

When you are ready to target the Venus zkVM, build with cargo-zisk:

cargo-zisk build --release

The resulting ELF lands in target/elf/riscv64ima-zisk-zkvm-elf/release/<name> (or target/riscv64ima-zisk-zkvm-elf/debug/<name> without --release).

Executing in the Emulator

ziskemu runs a guest ELF without generating a proof. Use it to validate behavior before committing time to a full prove run:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i host/tmp/input.bin

If you hit the step limit, raise it with -n:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i host/tmp/input.bin -n 10000000000

Performance Metrics (-m)

ziskemu -e target/.../guest -i input.bin -m
process_rom() steps=85309 duration=0.0009 tp=89.8565 Msteps/s freq=3051.0000 33.9542 clocks/step

Execution Statistics (-X)

ziskemu -e target/.../guest -i input.bin -X
Cost definitions:
    AREA_PER_SEC: 1000000 steps
    COST_MEMA_R1: 0.00002 sec
    ...

Total Cost: 12.81 sec
    Main Cost: 4.27 sec 85308 steps
    Mem Cost: 2.22 sec 222052 steps
    ...

Opcodes:
    add: 1.12 sec (77 steps/op) (14569 ops)
    xor: 1.06 sec (77 steps/op) (13774 ops)
    ...

Generating a Proof

Once the guest runs correctly in the emulator, you can produce a real proof. The repo's Makefile shows the canonical end-to-end flow; the CLI invocations below are what those targets call under the hood.

Step 1 -- ROM Setup

Required once after the guest ELF is built (and any time it changes):

cargo-zisk rom-setup -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -k ./build/provingKey
  • -e -- ELF path.
  • -k -- proving key directory.

ROM setup files are generated in ./build/provingKey (or $HOME/.zisk/cache if you installed the binaries to ~/.zisk/bin). Use cargo-zisk clean to drop the cache.

Step 2 -- Verify Constraints (Optional)

A fast sanity check that all circuit constraints are satisfied, without producing a full proof:

cargo-zisk verify-constraints \
  -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest \
  -i host/tmp/input.bin \
  -k ./build/provingKey

If everything is correct, you will see:

[INFO ] CstrVrfy: All global constraints were successfully verified
[INFO ] CstrVrfy: All constraints were verified

Step 3 -- Generate the Proof

cargo-zisk prove \
  -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest \
  -i host/tmp/input.bin \
  -k ./build/provingKey \
  -o proof -a -y
  • -e -- ELF path.
  • -i -- input file.
  • -k -- proving key directory.
  • -o -- output directory.
  • -a -- produce a final aggregated proof.
  • -y -- verify the proof immediately after generation.

Successful output ends with:

[INFO ] ProofMan: Vadcop Final proof was verified
[INFO ] ProofMan: Proofs generated successfully

Step 4 -- Verify the Proof

cargo-zisk verify -p ./proof/vadcop_final_proof.bin -k ./build/provingKey

Concurrent Proof Generation (MPI)

Venus proofs can be generated using multiple processes concurrently to cut wall-clock time. Processes are launched via standard MPI (Message Passing Interface) and may run on the same server or across machines:

mpirun --bind-to none \
  -np <num_processes> \
  -x OMP_NUM_THREADS=<num_threads_per_process> \
  -x RAYON_NUM_THREADS=<num_threads_per_process> \
  target/release/cargo-zisk <args>
  • <num_processes> -- how many processes to launch.
  • <num_threads_per_process> -- threads per process via OMP_NUM_THREADS / RAYON_NUM_THREADS.
  • --bind-to none -- let the OS schedule processes across cores for better load balancing.

Rule of thumb: <num_processes> * <num_threads_per_process> should match the number of available CPU cores (or 2x with hyperthreading). Memory usage scales linearly with <num_processes> (~25 GB per process).

GPU Proof Generation

Venus's GPU backend ships with several Cysic-contributed optimizations: cudaGraph integration, expression-evaluation kernel tuning, and shared-memory optimization for intermediate buffers.

The default make build already enables GPU support (cargo build --release --features gpu). If you build manually, use:

cargo build --release --features gpu

Notes:

  • GPU support is only available for NVIDIA GPUs.
  • The CUDA Toolkit must be installed.
  • Compile Venus directly on the server where it will run; the binary is optimized for the local GPU architecture.
  • GPU memory is typically more limited than system memory. When combining GPU proving with MPI concurrency, ensure each process has enough VRAM headroom.