Become an Operator

Run a Tangle Blueprint and earn rewards for serving AI inference to the network. Operators set their own pricing and compete on quality.

Revenue Model

80%

of inference revenue goes to you

You set pricing

Per-token rates you control, per model

Instant payouts

On-chain settlement, no invoicing

Requirements

Hardware

GPU-capable machine (NVIDIA A100/H100 recommended) or a Modal/cloud account for serverless deployments.

Stake

Minimum 10,000 TNT staked on Tangle. Your stake backs your SLA commitment and is slashable for sustained downtime.

Blueprint

Deploy a Tangle Blueprint that serves inference endpoints. Use a template or build your own.

Connectivity

Publicly accessible HTTPS endpoint with <200ms response time. Health check endpoint required at /health.

Blueprint Templates

vLLM Blueprint

Run open-weight models locally with vLLM. Best for operators with dedicated GPU hardware.

Models: Llama, Mistral, Qwen, DeepSeek, etc.

View on GitHub

Modal Blueprint

Serverless GPU inference via Modal. Auto-scales, no hardware management. Great for getting started.

Models: Any model deployable on Modal

View on GitHub

Voice Inference Blueprint

TTS and STT with multi-provider routing (ElevenLabs, Cartesia, OpenAI). Powers ph0ny voice platform.

Models: TTS-1, ElevenLabs, Cartesia, Whisper

View on GitHub

Image Generation Blueprint

Serve image generation models (Stable Diffusion, FLUX, SDXL) via ComfyUI or diffusers.

Models: Stable Diffusion, FLUX, SDXL

View on GitHub

Embedding Blueprint

Text embeddings via HuggingFace TEI. High-volume, low-cost. OpenAI-compatible API.

Models: BGE, E5, Jina, GTE

View on GitHub

Video Generation Blueprint

Video generation, avatars, lipsync, dubbing. Async job model for GPU-intensive workloads.

Models: Hunyuan Video, LTX-Video

View on GitHub

Training Blueprint

Single or multi-operator training. LoRA, QLoRA, DPO, GRPO, pre-training. Auto DeMo sync when multiple operators are in the service instance.

Models: Any open model (Llama, Mistral, Qwen, etc.)

View on GitHub

Distributed Inference Blueprint

Pipeline parallelism for 400B+ models. Split across operators by layer range.

Models: Llama 405B, Mixtral 8x22B, DeepSeek-V2

View on GitHub

Custom Blueprint

Build your own Blueprint with the Blueprint SDK. Full control over inference, billing, and TEE.

Models: Anything you can serve

View on GitHub

How It Works

1

Choose a Blueprint

Pick a Blueprint template based on your infrastructure. Fork the repo and configure your models.

2

Deploy & Test

Build and deploy your operator. Verify health checks pass and inference works on your local setup.

3

Register On-Chain

Stake TNT and register your operator on the Tangle network. Your Blueprint ID and endpoint URL go on-chain.

4

Start Serving

The gateway discovers your operator automatically. Requests start flowing based on your models, pricing, and reputation.

Quick Start

# Clone a Blueprint template
git clone https://github.com/tangle-network/llm-inference-blueprint
cd llm-inference-blueprint

# Configure your operator
cp operator/config.example.toml operator/config.toml
# Edit config.toml: set model, GPU count, pricing, endpoint URL

# Build the operator
cargo build --release

# Run locally for testing
./target/release/operator --config operator/config.toml

# Register on Tangle (requires TNT stake)
tangle operator register \
  --blueprint-id <your-blueprint-id> \
  --endpoint https://your-operator.example.com \
  --stake 10000