Zum Hauptinhalt springen

Models API Endpoint

Overview

This endpoint provides information about available AI models and their configurations.

Request Details

HTTP Method

GET

Route

/models

Route Parameters

This endpoint does not have any route parameters.

Query Parameters

This endpoint does not accept any query parameters.

Headers

HeaderValueRequiredDescription
Content-Typeapplication/jsonYesIndicates JSON request body

Request Body

This endpoint does not require a request body.

Response Format

Response Status Codes

Status CodeDescription
200Success
500Internal Server Error

Success Response

The response is a JSON object containing AI model configurations grouped by publisher.

Response Headers

HeaderValueDescription
Content-Typeapplication/jsonIndicates JSON response

Response Fields

FieldTypeDescription
configurationsObjectContains model configurations grouped by publisher
configurations.[publisher]ObjectObject containing models for a specific publisher
configurations.[publisher].[modelName]ObjectConfiguration details for a specific model
configurations.[publisher].[modelName].nameStringName of the model
configurations.[publisher].[modelName].publisherStringPublisher of the model
configurations.[publisher].[modelName].descriptionStringDescription of the model
configurations.[publisher].[modelName].iconStringIcon identifier for the model
configurations.[publisher].[modelName].typeStringType of model ('instruct' or 'chat')
configurations.[publisher].[modelName].configurationObjectModel configuration parameters
configurations.[publisher].[modelName].configuration.modelStringThe model identifier
configurations.[publisher].[modelName].configuration.parametersObjectModel-specific parameters
configurations.[publisher].[modelName].configuration.parameters.max_new_tokensIntegerMaximum number of tokens to generate
configurations.[publisher].[modelName].configuration.parameters.typical_pNumberTypical probability (-1 means using provider defaults)
configurations.[publisher].[modelName].configuration.parameters.repetition_penaltyNumberPenalty for repetition (-1 means using provider defaults)
configurations.[publisher].[modelName].configuration.parameters.truncateIntegerMaximum context length for truncation
configurations.[publisher].[modelName].configuration.parameters.return_full_textBooleanWhether to return the full text including prompt
configurations.[publisher].[modelName].configuration.parameters.temperatureNumberOptional: Temperature for sampling (higher = more random)
modelsArrayList of all allowed models in "publisher/modelName" format
endpointsObjectAvailable API endpoints grouped by endpoint category
endpoints.huggingfaceArrayList of huggingface API endpoints
endpoints.cloudflareArrayList of cloudflare API endpoints
endpoints.openrouterArrayList of openrouter API endpoints
endpoints.ollamaArrayList of ollama API endpoints

Error Responses

Internal Server Error (500)

{
"statusCode": 500,
"statusMessage": "Internal Server Error"
}

TypeScript Interface

export interface ModelParameters {
max_new_tokens: number
typical_p: number // Can be -1 for some models meaning it is using defaults of the providers
repetition_penalty: number // Can be -1 for some models meaning it is using defaults of the providers
truncate: number
return_full_text: boolean
temperature?: number
}

export interface ModelConfiguration {
publisher: string
name: string
description: string
icon: string
type: 'instruct' | 'chat'
configuration: {
model: string
parameters: ModelParameters
}
}

export interface ModelsResponse {
configurations: {
[publisher: string]: {
[modelName: string]: ModelConfiguration
}
}
models: string[]
endpoints: {
huggingface: string[]
cloudflare: string[]
openrouter: string[]
ollama: string[]
}
}

Python Model

from typing import Dict, List, Optional, Literal, TypedDict
from pydantic import BaseModel, Field, ConfigDict

class ModelParameters(BaseModel):
model_config = ConfigDict(extra='forbid')

max_new_tokens: int
typical_p: float = Field(description="Can be -1 for some models meaning it is using defaults of the providers")
repetition_penalty: float = Field(description="Can be -1 for some models meaning it is using defaults of the providers")
truncate: int
return_full_text: bool
temperature: Optional[float] = None

class ModelConfigurationData(BaseModel):
model_config = ConfigDict(extra='forbid')

model: str
parameters: ModelParameters

class ModelConfiguration(BaseModel):
model_config = ConfigDict(extra='forbid')

publisher: str
name: str
description: str
icon: str
type: Literal['instruct', 'chat']
configuration: ModelConfigurationData

class Endpoints(TypedDict):
huggingface: List[str]
cloudflare: List[str]
openrouter: List[str]
ollama: List[str]

class ModelsResponse(BaseModel):
model_config = ConfigDict(extra='forbid')

configurations: Dict[str, Dict[str, ModelConfiguration]]
models: List[str]
endpoints: Endpoints

Example JSON Response

{
"configurations": {
"qwen": {
"Qwen2.5-72B-Instruct": {
"publisher": "qwen",
"name": "Qwen2.5-72B-Instruct",
"description": "<strong>Qwen2.5 is a state-of-the-art large language model from Alibaba Cloud</strong><br>\n72.7B parameters with 80 layers and 64 attention heads for queries.<br>\nExcels at instruction following, long text generation, and structured data handling.<br>\n(Qwen License)",
"icon": "simple-icons:alibabadotcom",
"type": "instruct",
"configuration": {
"model": "Qwen/Qwen2.5-72B-Instruct",
"parameters": {
"max_new_tokens": 512,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"truncate": 32267,
"return_full_text": false
}
}
},
"Qwen2.5-Coder-32B-Instruct": {
"publisher": "qwen",
"name": "Qwen2.5-Coder-32B-Instruct",
"description": "<strong>Qwen2.5-Coder is a state-of-the-art code-specific model from Alibaba Cloud</strong><br>\nBuilt on Qwen2.5 with 5.5 trillion tokens of code-focused training data.<br>\nExcels in code generation, reasoning, and fixing with 131K token context support.<br>\n(Apache-2.0 License)",
"icon": "simple-icons:alibabadotcom",
"type": "instruct",
"configuration": {
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"parameters": {
"max_new_tokens": 512,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"truncate": 15499,
"return_full_text": false
}
}
}
},
"deepseek-ai": {
"DeepSeek-R1-Distill-Qwen-32B": {
"publisher": "deepseek-ai",
"name": "DeepSeek-R1-Distill-Qwen-32B",
"description": "SLOWEST MODEL! (high demand)<br><strong>DeepSeek R1 Distill is a powerful model distilled from DeepSeek-R1</strong><br>\nExcels at math, code, and reasoning tasks with performance close to OpenAI o1-mini.<br>\nBased on Qwen2.5 with enhanced reasoning capabilities through distillation.<br>\n(MIT License)",
"icon": "game-icons:angler-fish",
"type": "instruct",
"configuration": {
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"parameters": {
"max_new_tokens": 512,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"temperature": 0.6,
"truncate": 12499,
"return_full_text": false
}
}
}
},
"mistralai": {
"Mistral-Nemo-Instruct-2407": {
"publisher": "mistralai",
"name": "Mistral-Nemo-Instruct-2407",
"description": "<strong>Mistral Nemo is a powerful model jointly trained by Mistral AI and NVIDIA</strong><br>\n40-layer transformer with 5,120 hidden dimensions and 32 attention heads.<br>\nExcels at multilingual tasks with strong performance across 9 languages.<br>\n(Apache-2.0 License)",
"icon": "game-icons:hummingbird",
"type": "instruct",
"configuration": {
"model": "mistralai/Mistral-Nemo-Instruct-2407",
"parameters": {
"max_new_tokens": 500,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"temperature": 0.35,
"truncate": 127500,
"return_full_text": false
}
}
},
"Mistral-7B-Instruct-v0.3": {
"publisher": "mistralai",
"name": "Mistral-7B-Instruct-v0.3",
"description": "<strong>Latest version of Mistral's 7B instruction model with enhanced capabilities</strong><br>\nExtended 32K vocabulary, supports function calling, and uses v3 Tokenizer.<br>\nHighly efficient open-weights model with strong instruction following.<br>\n(Apache-2.0 License)",
"icon": "game-icons:hummingbird",
"type": "instruct",
"configuration": {
"model": "mistralai/Mistral-7B-Instruct-v0.3",
"parameters": {
"max_new_tokens": 500,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"truncate": 32268,
"return_full_text": false
}
}
}
},
"google": {
"gemma-2-27b-it": {
"publisher": "google",
"name": "gemma-2-27b-it",
"description": "<strong>Gemma is a family of lightweight, state-of-the-art open models from Google</strong><br>\nBuilt from the same research and technology used to create the Gemini models.<br>\nWell-suited for text generation tasks including question answering, summarization, and reasoning.<br>\n(Apache-2.0 License)",
"icon": "simple-icons:google",
"type": "instruct",
"configuration": {
"model": "google/gemma-2-27b-it",
"parameters": {
"max_new_tokens": 500,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"truncate": 7692,
"return_full_text": false
}
}
}
},
"openrouter": {
"gemini-2.0-pro-exp-02-05": {
"publisher": "openrouter",
"name": "gemini-2.0-pro-exp-02-05",
"description": "<strong>Google's Gemini 2.0 Pro Experimental Model</strong><br>\nLatest experimental version of Gemini with enhanced capabilities.<br>\nExcellent at reasoning, coding, and creative tasks.",
"icon": "simple-icons:google",
"type": "chat",
"configuration": {
"model": "google/gemini-2.0-pro-exp-02-05:free",
"parameters": {
"max_new_tokens": 8192,
"typical_p": -1,
"repetition_penalty": -1,
"truncate": 1991808,
"return_full_text": false
}
}
},
"deepseek-chat": {
"publisher": "openrouter",
"name": "deepseek-chat",
"description": "<strong>DeepSeek's Chat Model</strong><br>\nPowerful model optimized for natural conversations and reasoning.<br>\nStrong performance across various tasks including coding.",
"icon": "game-icons:angler-fish",
"type": "chat",
"configuration": {
"model": "deepseek/deepseek-chat:free",
"parameters": {
"max_new_tokens": 128000,
"typical_p": -1,
"repetition_penalty": -1,
"truncate": 128000,
"return_full_text": false
}
}
},
"llama-3.3-70b-instruct": {
"publisher": "openrouter",
"name": "llama-3.3-70b-instruct",
"description": "<strong>Meta's Llama 3.3 70B Instruct Model</strong><br>\nLatest version of Llama optimized for instruction following.<br>\nExcellent performance across multiple languages and tasks.",
"icon": "simple-icons:meta",
"type": "chat",
"configuration": {
"model": "meta-llama/llama-3.3-70b-instruct:free",
"parameters": {
"max_new_tokens": 2048,
"typical_p": -1,
"repetition_penalty": -1,
"truncate": 129024,
"return_full_text": false
}
}
}
},
"cloudflare": {
"llama-3.3-70b-instruct-fp8-fast": {
"publisher": "cloudflare",
"name": "llama-3.3-70b-instruct-fp8-fast",
"description": "<strong>Llama 3.3 70B quantized to fp8 precision, optimized to be faster</strong><br>\nPowerful model with 70B parameters optimized for instruction following.<br>\nExcels at text generation, reasoning, and structured data handling.<br>\n(Llama 3.3 Community License)",
"icon": "simple-icons:cloudflare",
"type": "instruct",
"configuration": {
"model": "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
"parameters": {
"max_new_tokens": 256,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"truncate": 130816,
"return_full_text": false
}
}
}
},
"microsoft": {
"Phi-3-mini-4k-instruct": {
"publisher": "microsoft",
"name": "Phi-3-mini-4k-instruct",
"description": "<strong>Phi-3 Mini is a 3.8B parameter, lightweight, state-of-the-art open model from Microsoft</strong><br>\nTrained with high-quality datasets focused on reasoning and instruction following.<br>\nExcellent performance for math, coding, and logical reasoning tasks.<br>\n(MIT License)",
"icon": "simple-icons:microsoft",
"type": "instruct",
"configuration": {
"model": "microsoft/Phi-3-mini-4k-instruct",
"parameters": {
"max_new_tokens": 500,
"typical_p": 0.2,
"repetition_penalty": 1.1,
"truncate": 3596,
"return_full_text": false
}
}
}
},
"ollama": {
"rwkv-6-world": {
"publisher": "ollama",
"name": "rwkv-6-world",
"description": "<strong>RWKV-6-World is an efficient 1.6B parameter model</strong><br>\nTrained on diverse datasets with strong performance in 12 languages.<br>\nEfficient architecture combining RNN and transformer-like capabilities.<br>\n(Apache-2.0 License)",
"icon": "simple-icons:ollama",
"type": "instruct",
"configuration": {
"model": "mollysama/rwkv-6-world:1.6b",
"parameters": {
"max_new_tokens": 500,
"typical_p": -1,
"repetition_penalty": -1,
"truncate": 3596,
"return_full_text": false
}
}
}
}
},
"models": [
"google/gemma-2-27b-it",
"qwen/Qwen2.5-72B-Instruct",
"qwen/Qwen2.5-Coder-32B-Instruct",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"mistralai/Mistral-Nemo-Instruct-2407",
"mistralai/Mistral-7B-Instruct-v0.3",
"microsoft/Phi-3-mini-4k-instruct",
"cloudflare/llama-3.3-70b-instruct-fp8-fast",
"openrouter/gemini-2.0-pro-exp-02-05",
"openrouter/deepseek-chat",
"openrouter/llama-3.3-70b-instruct",
"ollama/rwkv-6-world"
],
"endpoints": {
"huggingface": [
"/api/ai/huggingface/gemma-2-27b-it/chat",
"/api/ai/huggingface/Qwen2.5-72B-Instruct/chat",
"/api/ai/huggingface/Qwen2.5-Coder-32B-Instruct/chat",
"/api/ai/huggingface/DeepSeek-R1-Distill-Qwen-32B/chat",
"/api/ai/huggingface/Mistral-Nemo-Instruct-2407/chat",
"/api/ai/huggingface/Mistral-7B-Instruct-v0.3/chat",
"/api/ai/huggingface/Phi-3-mini-4k-instruct/chat"
],
"cloudflare": [
"/api/ai/cloudflare/llama-3.3-70b-instruct-fp8-fast/chat"
],
"openrouter": [
"/api/ai/openrouter/gemini-2.0-pro-exp-02-05/chat",
"/api/ai/openrouter/deepseek-chat/chat",
"/api/ai/openrouter/llama-3.3-70b-instruct/chat"
],
"ollama": [
"/api/ai/ollama/rwkv-6-world/chat"
]
}
}

Code Examples

cURL Example

curl -X GET \
https://neptun-webui.vercel.app/api/models \
-H 'Content-Type: application/json'

Python Example

import httpx
from typing import List

async def get_models() -> ModelsResponse:
"""
Fetches AI model configurations, allowed models, and API endpoints.

Returns:
ModelsResponse object containing:
- configurations: Model configurations grouped by publisher
- models: List of all allowed models
- endpoints: Available API endpoints grouped by category
"""
async with httpx.AsyncClient() as client:
response = await client.get(
"https://neptun-webui.vercel.app/api/models",
headers={"Content-Type": "application/json"}
)
response.raise_for_status()
return ModelsResponse.model_validate(response.json())

async def list_openrouter_endpoints() -> List[str]:
models_data = await get_models()
return models_data.endpoints["openrouter"]

TypeScript Example

async function getModels(): Promise<ModelsResponse> {
const response = await fetch('https://neptun-webui.vercel.app/api/models', {
method: 'GET',
headers: {
'Content-Type': 'application/json',
},
})

if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`)
}

return await response.json()
}

async function listOpenRouterEndpoints(): Promise<string[]> {
const modelsData = await getModels()
return modelsData.endpoints.openrouter
}