Now in private beta

Ship AI models that actually run anywhere

Upload your model. Define your hardware constraints. Get back a production-ready, optimized model with benchmarks — in minutes, not weeks.

Request Early Access See How It Works

edgeforge-cli — optimize

# Compress a HuggingFace model for Android deployment

$ edgeforge optimize --model microsoft/resnet-50 \

--target android-midrange --strategy auto

▸ Analyzing model architecture...

▸ Running INT8 quantization + channel pruning...

▸ Recovering accuracy via knowledge distillation...

▸ Benchmarking against device profile...

✓ Optimization complete

Size: 97.8 MB → 25.3 MB (74.1% reduction)

Latency: 142ms → 36ms (3.9x faster)

Accuracy: 76.1% → 75.4% (0.7% drop)

Output: ./optimized/resnet50-android.onnx

// How it works

Three steps. Any model. Any device.

EdgeForge automates the entire model optimization pipeline — from analysis to export — so you can focus on building products, not compressing models.

01 — UPLOAD

Upload Your Model

Drag and drop a PyTorch checkpoint, ONNX file, or just paste a HuggingFace model ID. We handle the rest.

02 — CONFIGURE

Pick Your Target

Select from 10+ pre-built device profiles — Raspberry Pi, Android, Jetson, iOS, browser — or define custom hardware constraints.

03 — DEPLOY

Download & Ship

Get your optimized model with a full benchmark report: size, latency, accuracy tradeoffs — ready for production deployment.

// Capabilities

Built for the edge. Not the cloud.

Every optimization technique in EdgeForge is designed for real-world deployment where compute is scarce and connectivity isn't guaranteed.

⚡

Mixed-Precision Quantization

INT8, INT4, and per-layer mixed-precision. Including GPTQ/AWQ for LLMs. Automatic sensitivity analysis picks the right strategy.

✂️

Structured Pruning

Remove entire channels and attention heads for real speedup on any hardware — no sparse runtime needed.

🧬

Geometry-Aware Distillation

Proprietary technique using optimal transport theory. 10–15% better accuracy retention than standard knowledge distillation.

📊

Automated Benchmarking

Every job produces a detailed comparison: size, latency (p50/p95/p99), accuracy, and resource usage on your target hardware.

🔧

10+ Device Profiles

Pre-built targets for Android, Raspberry Pi, Jetson, iOS, browser (WASM), and TinyML. Create custom profiles in seconds.

🔌

REST API + CLI

Integrate optimization into your CI/CD pipeline. Programmatic access to everything — upload, optimize, benchmark, download.

// Real results

Numbers that speak for themselves

Real optimizations on popular models targeting common edge devices. No cherry-picked results.

ResNet-50 → Android

74.1%

SIZE REDUCTION

97.8 MB → 2.5 MB · Prune 90% + int8

DistilBERT → Raspberry Pi

4.6x

INFERENCE SPEEDUP

253.2 MB → 55.8 MB · Prune 90% + int 8

YOLOv8-S → Jetson Nano

9.9x

INFERENCE SPEEDUP

13.9 MB → 1.4 MB · Prune 90% + INT8

// Supported targets

Deploy to any hardware

Pre-built profiles for the most popular edge targets. Custom profiles for everything else.

Android Low-End

Android Mid-Range

Raspberry Pi 4

Raspberry Pi 5

NVIDIA Jetson Nano

Jetson Orin Nano

iOS (CoreML)

Browser (WASM)

Edge Server (x86)

TinyML (Cortex-M)

+ Custom Profile

// Pricing

Start free. Scale when ready.

No credit card required. Upgrade when you need more power.

Free

For experimentation and small projects

3 optimizations / month
Models up to 100M parameters
5 device profiles
Community support

Pro

$49/mo

For teams shipping AI to production

Unlimited optimizations
Models up to 7B parameters
All device profiles + custom
Priority queue
API access + webhooks
PDF benchmark reports

Enterprise

Custom

For organizations with advanced needs

Everything in Pro
Self-hosted deployment
Custom pipeline configuration
SLA & dedicated support
SSO & audit logging