Tools tagged with "Inference"

Mistral

AI Solutions Tailored for Enterprise Success

LLM

Customize and deploy AI assistants and agents with open models, ensuring privacy and control.

KodeAgent

Build AI Agents with Minimal Complexity

AI Agents

+4 more

Create efficient AI agents using a lightweight, frameworkless approach. Supports ReAct and CodeAct paradigms for versatile applications.

Qwen3

Advanced Language Models for Complex Tasks

AI Agents

+4 more

Explore Qwen3's powerful language models for deep reasoning and multilingual capabilities.

MLC LLM

Universal LLM deployment engine

LLMMLOps

MLC LLM is a universal solution that enables native deployment of any large language models with native APIs with compiler acceleration.

LMDeploy

Toolkit for compressing, deploying, and serving LLMs

LLMMLOps

LMDeploy is a toolkit for compressing, deploying, and serving LLMs. It provides efficient inference with TurboMind engine.

Insanely Fast Whisper

Lightning-Fast Audio Transcription

Audio & Speech

Transcribe audio in seconds with a powerful CLI tool using Whisper and Flash Attention.

Mosec

Optimize ML Serving with Dynamic Batching

MLOps

Boost ML model serving with dynamic batching and CPU/GPU pipelines for maximum efficiency.

SGLang

Accelerate AI with High-Performance Model Serving

LLMMLOps

Deliver low-latency, high-throughput inference for AI models. Supports diverse hardware and model types with extensive community backing.

GGML

Efficient Tensor Operations Simplified

Frameworks

Cross-platform tensor library with no dependencies, supporting quantization and diverse hardware.

ExLlamaV2

Run LLMs Locally on Consumer GPUs

LLM

Fast inference library for local LLMs on consumer GPUs. Supports dynamic batching, smart caching, and more.

TensorRT-LLM

Optimize LLM Inference on NVIDIA GPUs

LLMMLOps

Enhance LLM performance with Python API and NVIDIA GPU optimizations for efficient inference.

Triton Inference Server

Deploy AI Models Seamlessly Across Platforms

MLOps

Deploy AI models across frameworks with dynamic batching, real-time support, and cloud integration.

LM Studio

Local AI Models, Private and Free

LLM

Run AI models like GPT-OSS and Llama privately on your computer. Free for home and work.

Text Generation WebUI

A Gradio web UI for running Large Language Models

LLM

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp, and many other backends.

Faster Whisper

Accelerate Transcription with CTranslate2

Audio & Speech

Experience rapid, efficient transcription using CTranslate2 for faster results and reduced memory usage.

Llama.cpp

Efficient LLM Inference in C/C++

LLM

Achieve high-performance LLM inference with minimal setup using C/C++. Supports diverse hardware and quantization for optimal efficiency.