CodeRocket AI
Browse
About Us
Blog
Advertise
Submit
Sign In
Latest tools
Categories
Tags
Submit
About Us
Blog
Advertise
/
Tags
/
Inference
Tools tagged with "Inference"
Popular Categories:
MLOps & Infrastructure
28
ML Frameworks & Libraries
28
Large Language Models
26
AI Agents & Automation
19
Generative AI & Creative Tools
15
AI Code & Dev Tools
11
RAG & Knowledge Management
11
Audio & Speech Processing
8
Data Processing & ETL
6
Natural Language Processing
5
Vector Databases
4
Computer Vision
3
Ad
Your brand here
— Reach our audience of professional directory owners and boost your sales.
Advertise on CodeRocket AI
Order by
Mistral
AI Solutions Tailored for Enterprise Success
LLM
Customize and deploy AI assistants and agents with open models, ensuring privacy and control.
KodeAgent
Build AI Agents with Minimal Complexity
AI Agents
+4 more
Create efficient AI agents using a lightweight, frameworkless approach. Supports ReAct and CodeAct paradigms for versatile applications.
Qwen3
Advanced Language Models for Complex Tasks
AI Agents
+4 more
Explore Qwen3's powerful language models for deep reasoning and multilingual capabilities.
MLC LLM
Universal LLM deployment engine
LLM
MLOps
MLC LLM is a universal solution that enables native deployment of any large language models with native APIs with compiler acceleration.
LMDeploy
Toolkit for compressing, deploying, and serving LLMs
LLM
MLOps
LMDeploy is a toolkit for compressing, deploying, and serving LLMs. It provides efficient inference with TurboMind engine.
Insanely Fast Whisper
Lightning-Fast Audio Transcription
Audio & Speech
Transcribe audio in seconds with a powerful CLI tool using Whisper and Flash Attention.
Mosec
Optimize ML Serving with Dynamic Batching
MLOps
Boost ML model serving with dynamic batching and CPU/GPU pipelines for maximum efficiency.
SGLang
Accelerate AI with High-Performance Model Serving
LLM
MLOps
Deliver low-latency, high-throughput inference for AI models. Supports diverse hardware and model types with extensive community backing.
GGML
Efficient Tensor Operations Simplified
Frameworks
Cross-platform tensor library with no dependencies, supporting quantization and diverse hardware.
ExLlamaV2
Run LLMs Locally on Consumer GPUs
LLM
Fast inference library for local LLMs on consumer GPUs. Supports dynamic batching, smart caching, and more.
TensorRT-LLM
Optimize LLM Inference on NVIDIA GPUs
LLM
MLOps
Enhance LLM performance with Python API and NVIDIA GPU optimizations for efficient inference.
Triton Inference Server
Deploy AI Models Seamlessly Across Platforms
MLOps
Deploy AI models across frameworks with dynamic batching, real-time support, and cloud integration.
LM Studio
Local AI Models, Private and Free
LLM
Run AI models like GPT-OSS and Llama privately on your computer. Free for home and work.
Text Generation WebUI
A Gradio web UI for running Large Language Models
LLM
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp, and many other backends.
Faster Whisper
Accelerate Transcription with CTranslate2
Audio & Speech
Experience rapid, efficient transcription using CTranslate2 for faster results and reduced memory usage.
Llama.cpp
Efficient LLM Inference in C/C++
LLM
Achieve high-performance LLM inference with minimal setup using C/C++. Supports diverse hardware and quantization for optimal efficiency.
Ad
Your brand here
Reach our audience of professional directory owners and boost your sales.
Advertise on CodeRocket AI