Writing Programs

This page explains how to author Rust programs that execute inside the Venus zkVM, and how to drive them with Venus's cargo-zisk CLI.

Project Skeleton

The fastest way to scaffold a new guest is the SDK helper:

cargo-zisk sdk new sha_hasher
cd sha_hasher

This produces:

.
├── build.rs
├── Cargo.toml
├── .gitignore
├── guest
|   ├── src
|   |    └── main.rs
|   └── Cargo.toml
└── host
    ├── src
    |    └── main.rs
    ├── bin
    |    ├── compressed.rs
    |    ├── execute.rs
    |    ├── prove.rs
    |    ├── plonk.rs
    |    ├── verify-constraints.rs
    |    └── ziskemu.rs
    ├── build.rs
    └── Cargo.toml

The example program takes a number n as input and computes the SHA-256 hash n times.

Authoring a Guest

A Venus guest is a no_main Rust binary with a Venus-provided entrypoint macro and the ziskos runtime crate.

`main.rs`

// Compute SHA-256 hash `n` times sequentially.

#![no_main]
ziskos::entrypoint!(main);

use sha2::{Digest, Sha256};

fn main() {
    let n: u32 = ziskos::io::read();

    let mut hash = [0u8; 32];
    for _ in 0..n {
        let mut hasher = Sha256::new();
        hasher.update(hash);
        hash = hasher.finalize().into();
    }

    ziskos::io::commit(&hash);
}

`Cargo.toml`

[package]
name = "guest"
version = "0.1.0"
edition = "2021"

[dependencies]
sha2 = "0.10.8"
ziskos = { git = "https://github.com/cysic-labs/venus.git" }

Input / Output

Reading inputs:

let n: u32 = ziskos::io::read();
let my_data: MyStruct = ziskos::io::read();   // any `Deserialize` type

Committing public outputs:

let hash: [u8; 32] = compute_hash();
ziskos::io::commit(&hash);                    // any `Serialize` type

Committed values become public outputs that anyone verifying the proof can inspect.

Building

You can run the guest natively (just like any Rust program) for development:

cargo build --release

When you are ready to target the Venus zkVM, build with cargo-zisk:

cargo-zisk build --release

The resulting ELF lands in target/elf/riscv64ima-zisk-zkvm-elf/release/<name> (or target/riscv64ima-zisk-zkvm-elf/debug/<name> without --release).

Executing in the Emulator

ziskemu runs a guest ELF without generating a proof. Use it to validate behavior before committing time to a full prove run:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i host/tmp/input.bin

If you hit the step limit, raise it with -n:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i host/tmp/input.bin -n 10000000000

Performance Metrics (`-m`)

ziskemu -e target/.../guest -i input.bin -m

process_rom() steps=85309 duration=0.0009 tp=89.8565 Msteps/s freq=3051.0000 33.9542 clocks/step

Execution Statistics (`-X`)

ziskemu -e target/.../guest -i input.bin -X

Cost definitions:
    AREA_PER_SEC: 1000000 steps
    COST_MEMA_R1: 0.00002 sec
    ...

Total Cost: 12.81 sec
    Main Cost: 4.27 sec 85308 steps
    Mem Cost: 2.22 sec 222052 steps
    ...

Opcodes:
    add: 1.12 sec (77 steps/op) (14569 ops)
    xor: 1.06 sec (77 steps/op) (13774 ops)
    ...

Generating a Proof

Once the guest runs correctly in the emulator, you can produce a real proof. The repo's Makefile shows the canonical end-to-end flow; the CLI invocations below are what those targets call under the hood.

Step 1 -- ROM Setup

Required once after the guest ELF is built (and any time it changes):

cargo-zisk rom-setup -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -k ./build/provingKey

-e -- ELF path.
-k -- proving key directory.

ROM setup files are generated in ./build/provingKey (or $HOME/.zisk/cache if you installed the binaries to ~/.zisk/bin). Use cargo-zisk clean to drop the cache.

Step 2 -- Verify Constraints (Optional)

A fast sanity check that all circuit constraints are satisfied, without producing a full proof:

cargo-zisk verify-constraints \
  -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest \
  -i host/tmp/input.bin \
  -k ./build/provingKey

If everything is correct, you will see:

[INFO ] CstrVrfy: All global constraints were successfully verified
[INFO ] CstrVrfy: All constraints were verified

Step 3 -- Generate the Proof

cargo-zisk prove \
  -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest \
  -i host/tmp/input.bin \
  -k ./build/provingKey \
  -o proof -a -y

-e -- ELF path.
-i -- input file.
-k -- proving key directory.
-o -- output directory.
-a -- produce a final aggregated proof.
-y -- verify the proof immediately after generation.

Successful output ends with:

[INFO ] ProofMan: Vadcop Final proof was verified
[INFO ] ProofMan: Proofs generated successfully

Step 4 -- Verify the Proof

cargo-zisk verify -p ./proof/vadcop_final_proof.bin -k ./build/provingKey

Concurrent Proof Generation (MPI)

Venus proofs can be generated using multiple processes concurrently to cut wall-clock time. Processes are launched via standard MPI (Message Passing Interface) and may run on the same server or across machines:

mpirun --bind-to none \
  -np <num_processes> \
  -x OMP_NUM_THREADS=<num_threads_per_process> \
  -x RAYON_NUM_THREADS=<num_threads_per_process> \
  target/release/cargo-zisk <args>

<num_processes> -- how many processes to launch.
<num_threads_per_process> -- threads per process via OMP_NUM_THREADS / RAYON_NUM_THREADS.
--bind-to none -- let the OS schedule processes across cores for better load balancing.

Rule of thumb: <num_processes> * <num_threads_per_process> should match the number of available CPU cores (or 2x with hyperthreading). Memory usage scales linearly with <num_processes> (~25 GB per process).

GPU Proof Generation

Venus's GPU backend ships with several Cysic-contributed optimizations: cudaGraph integration, expression-evaluation kernel tuning, and shared-memory optimization for intermediate buffers.

The default make build already enables GPU support (cargo build --release --features gpu). If you build manually, use:

cargo build --release --features gpu

Notes:

GPU support is only available for NVIDIA GPUs.
The CUDA Toolkit must be installed.
Compile Venus directly on the server where it will run; the binary is optimized for the local GPU architecture.
GPU memory is typically more limited than system memory. When combining GPU proving with MPI concurrency, ensure each process has enough VRAM headroom.