OpenAI Chat
NestJS AI supports the various AI language models from OpenAI, the company behind ChatGPT, which has been instrumental in sparking interest in AI-driven text generation thanks to its creation of industry-leading text generation models and embeddings.
@nestjs-ai/model-openai uses the official openai Node.js SDK under the hood.
|
Prerequisites
You will need to create an API with OpenAI to access ChatGPT models.
Create an account at OpenAI signup page and generate the token on the API Keys page.
The OpenAiChatModelModule accepts an apiKey property that should be set to the value of the API Key obtained from openai.com.
You can pass the API key directly to forFeature():
OpenAiChatModelModule.forFeature({
apiKey: process.env.OPENAI_API_KEY,
});
For enhanced security when handling sensitive information like API keys, use forFeatureAsync() together with @nestjs/config so that the value is resolved at module initialization time:
import { ConfigModule, ConfigService } from '@nestjs/config';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
OpenAiChatModelModule.forFeatureAsync({
imports: [ConfigModule.forRoot()],
inject: [ConfigService],
useFactory: (config: ConfigService) => ({
apiKey: config.getOrThrow<string>('OPENAI_API_KEY'),
}),
});
You can also retrieve the API key programmatically from any secure source:
const apiKey = process.env.OPENAI_API_KEY;
Module Configuration
Installation
Install the platform module together with the OpenAI model package:
pnpm add @nestjs-ai/platform @nestjs-ai/model-openai @nestjs-ai/model @nestjs-ai/commons openai
@nestjs-ai/model and @nestjs-ai/commons are peer dependencies of @nestjs-ai/model-openai. The openai Node.js SDK is also a required peer.
|
Basic Setup
NestAiModule.forRoot() must be imported before OpenAiChatModelModule.forFeature(). The chat module reads the connection-level properties (apiKey, baseUrl, organizationId, …) and any chat-specific overrides from the options object:
import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
@Module({
imports: [
NestAiModule.forRoot(),
OpenAiChatModelModule.forFeature({
apiKey: process.env.OPENAI_API_KEY,
options: {
model: 'gpt-4o',
temperature: 0.7,
},
}),
],
})
export class AppModule {}
Async Configuration
For dynamic configuration (e.g., loading API keys from environment or a config service):
import { Module } from '@nestjs/common';
import { ConfigModule, ConfigService } from '@nestjs/config';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
@Module({
imports: [
ConfigModule.forRoot(),
NestAiModule.forRoot(),
OpenAiChatModelModule.forFeatureAsync({
inject: [ConfigService],
useFactory: (config: ConfigService) => ({
apiKey: config.getOrThrow<string>('OPENAI_API_KEY'),
options: {
model: 'gpt-4o',
temperature: 0.7,
},
}),
}),
],
})
export class AppModule {}
Module Options
Connection-level properties (top level of forFeature() input):
| Property | Type | Default | Description |
|---|---|---|---|
|
|
— |
OpenAI API key. |
|
|
Base URL of the OpenAI-compatible service. |
|
|
|
— |
Organization ID. Useful for users that belong to multiple organizations. |
|
|
— |
Deployment name (Microsoft Foundry). |
|
|
|
Set to |
|
|
|
Set to |
|
|
— |
Token provider for Azure AD authentication. |
|
|
— |
Default model name resolved when no |
|
|
|
Request timeout for the OpenAI client. |
|
|
|
Maximum retries for transient failures. |
|
|
|
Custom HTTP headers added to every request. |
|
|
— |
Underlying |
|
|
— |
Tool-calling observation settings (see Tool Calling). |
Chat-specific options (under the options key of forFeature() input):
| Property | Type | Default | Description |
|---|---|---|---|
|
|
|
Name of the OpenAI chat model. See the models page. |
|
|
— |
Sampling temperature. Higher values make output more random; lower values make results more focused. Not supported by GPT-5 reasoning models. |
|
|
— |
Number between -2.0 and 2.0. Positive values penalize tokens based on existing frequency. |
|
|
— |
Modifies the likelihood of specified tokens appearing in the completion. |
|
|
— |
Whether to return log probabilities of the output tokens. |
|
|
— |
Number of most likely tokens to return at each token position. Requires |
|
|
— |
Maximum tokens to generate. Use for non-reasoning models (e.g. gpt-4o). Mutually exclusive with |
|
|
— |
Upper bound for tokens generated, including reasoning tokens. Required for reasoning models (o1, o3, o4-mini series). Mutually exclusive with |
|
|
— |
Number of chat completion choices to generate for each input message. |
|
|
— |
Whether to store the output of this chat completion request. |
|
|
— |
Developer-defined tags and values used for filtering completions in the chat completion dashboard. |
|
|
— |
Output types the model should generate. The |
|
|
— |
Audio parameters for audio output. Required when |
|
|
— |
Number between -2.0 and 2.0. Positive values penalize tokens based on whether they appear in the text so far. |
|
|
— |
OpenAI response format. Use |
|
|
— |
If specified, the system makes a best effort to sample deterministically. |
|
|
— |
Up to 4 sequences where the API will stop generating further tokens. |
|
|
— |
Nucleus sampling parameter. Generally recommended to alter this or |
|
|
— |
Controls which (if any) tool is called by the model. |
|
|
— |
Unique identifier representing your end-user. Deprecated; prefer |
|
|
— |
Streaming options. Set |
|
|
|
Whether to enable parallel function calling during tool use. |
|
|
— |
Reasoning effort level for reasoning models. Values: |
|
|
— |
Output verbosity for compatible models. |
|
|
— |
Specifies the processing type used for serving the request. |
|
|
— |
Additional parameters merged into the top level of the JSON request. Intended for OpenAI-compatible servers (vLLM, Ollama, etc.). See Using Extra Parameters with OpenAI-Compatible Servers. |
|
|
— |
List of tools, identified by their names, to enable for function calling. Tools with those names must exist in the resolver registry. |
|
|
— |
Tool callbacks to register with the chat model. |
|
|
|
If |
|
|
|
Context map forwarded to tool callbacks. |
|
When using GPT-5 models such as |
All options under options can be overridden at runtime by adding request-specific Runtime Options to the Prompt call.
|
Token Limit Parameters: Model-Specific Usage
OpenAI provides two mutually exclusive parameters for controlling token generation limits:
| Parameter | Use Case | Compatible Models |
|---|---|---|
|
Non-reasoning models |
gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo |
|
Reasoning models |
o1, o1-mini, o1-preview, o3, o4-mini series |
| These parameters are mutually exclusive. Setting both will result in an API error from OpenAI. |
Usage Examples
For non-reasoning models (gpt-4o, gpt-3.5-turbo):
const response = await chatModel.call(
new Prompt(
'Explain quantum computing in simple terms.',
OpenAiChatOptions.builder()
.model('gpt-4o')
.maxTokens(150) // Use maxTokens for non-reasoning models
.build(),
),
);
For reasoning models (o1, o3 series):
const response = await chatModel.call(
new Prompt(
'Solve this complex math problem step by step: ...',
OpenAiChatOptions.builder()
.model('o1-preview')
.maxCompletionTokens(1000) // Use maxCompletionTokens for reasoning models
.build(),
),
);
Builder Validation:
The OpenAiChatOptions builder enforces mutual exclusivity with a "last-set-wins" approach:
// This will automatically clear maxTokens and use maxCompletionTokens
const options = OpenAiChatOptions.builder()
.maxTokens(100) // Set first
.maxCompletionTokens(200) // This clears maxTokens and logs a warning
.build();
// Result: maxTokens = null, maxCompletionTokens = 200
Runtime Options
The OpenAiChatOptions class provides model configurations such as the model to use, the temperature, the frequency penalty, etc.
On start-up, the default options can be configured with the OpenAiChatModel constructor or the OpenAiChatModelModule.forFeature({ options: {…} }) properties.
At run-time, you can override the default options by adding new, request-specific options to the Prompt call.
For example, to override the default model and temperature for a specific request:
const response = await chatModel.call(
new Prompt(
'Generate the names of 5 famous pirates.',
OpenAiChatOptions.builder()
.model('gpt-4o')
.temperature(0.4)
.build(),
),
);
In addition to the model-specific OpenAiChatOptions you can use a portable ChatOptions instance, created with ChatOptions.builder().
|
Function Calling
You can register custom tools with the OpenAiChatModel and have the OpenAI model intelligently choose to output a JSON object containing arguments to call one or many of the registered tools.
This is a powerful technique to connect the LLM capabilities with external tools and APIs.
Read more about Tool Calling.
Multimodal
Multimodality refers to a model’s ability to simultaneously understand and process information from various sources, including text, images, audio, and other data formats. OpenAI supports text, vision, and audio input modalities.
Vision
OpenAI models that offer vision multimodal support include gpt-4, gpt-4o, and gpt-4o-mini.
Refer to the Vision guide for more information.
The OpenAI User Message API can incorporate a list of base64-encoded images or image URLs with the message.
NestJS AI’s Message interface facilitates multimodal AI models by introducing the Media type from @nestjs-ai/commons.
This type encompasses data and metadata about media attachments in messages, including a MimeType (the MediaFormat enum is available for common types) and the raw media data (a Buffer/Uint8Array, a string, or a URL).
Below is an example using a local image with the gpt-4o model:
import { readFileSync } from 'node:fs';
import { Media, MediaFormat } from '@nestjs-ai/commons';
import { UserMessage } from '@nestjs-ai/model';
import { Prompt } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';
const imageBytes = readFileSync(new URL('./multimodal.test.png', import.meta.url));
const userMessage = new UserMessage({
content: 'Explain what do you see on this picture?',
media: [new Media({ mimeType: MediaFormat.IMAGE_PNG, data: imageBytes })],
});
const response = await chatModel.call(
new Prompt(
[userMessage],
OpenAiChatOptions.builder().model('gpt-4o').build(),
),
);
GPT_4_VISION_PREVIEW is no longer available to new users. If you are not an existing user, please use the gpt-4o or gpt-4-turbo models. More details here.
|
Or the image URL equivalent using the gpt-4o model:
const userMessage = new UserMessage({
content: 'Explain what do you see on this picture?',
media: [
new Media({
mimeType: MediaFormat.IMAGE_PNG,
data: new URL('https://example.com/multimodal.test.png'),
}),
],
});
const response = await chatModel.call(
new Prompt(
[userMessage],
OpenAiChatOptions.builder().model('gpt-4o').build(),
),
);
You can pass multiple images as well — just add more Media entries to the media array.
|
The example shows a model taking as an input the multimodal.test.png image:
along with the text message "Explain what do you see on this picture?", and generating a response like this:
This is an image of a fruit bowl with a simple design. The bowl is made of metal with curved wire edges that create an open structure, allowing the fruit to be visible from all angles. Inside the bowl, there are two yellow bananas resting on top of what appears to be a red apple. The bananas are slightly overripe, as indicated by the brown spots on their peels. The bowl has a metal ring at the top, likely to serve as a handle for carrying. The bowl is placed on a flat surface with a neutral-colored background that provides a clear view of the fruit inside.
Audio
OpenAI models that offer input audio multimodal support include gpt-4o-audio-preview.
Refer to the Audio guide for more information.
The OpenAI User Message API can incorporate a list of base64-encoded audio files with the message, attached via the media field of UserMessage.
Currently, OpenAI supports only the following media types: audio/mp3 and audio/wav (MediaFormat.AUDIO_MP3 and MediaFormat.AUDIO_WAV).
Below is an example, illustrating the fusion of user text with an audio file using the gpt-4o-audio-preview model:
import { readFileSync } from 'node:fs';
import { Media, MediaFormat } from '@nestjs-ai/commons';
import { Prompt, UserMessage } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';
const audioBytes = readFileSync(new URL('./speech1.mp3', import.meta.url));
const userMessage = new UserMessage({
content: 'What is this recording about?',
media: [new Media({ mimeType: MediaFormat.AUDIO_MP3, data: audioBytes })],
});
const response = await chatModel.call(
new Prompt(
[userMessage],
OpenAiChatOptions.builder().model('gpt-4o-audio-preview').build(),
),
);
| You can pass multiple audio files as well. |
Output Audio
OpenAI models that offer output audio multimodal support include gpt-4o-audio-preview.
Refer to the Audio guide for more information.
The OpenAI Assistant Message API can return a list of base64-encoded audio files with the message.
Currently, OpenAI supports only the following audio types for output: audio/mp3 and audio/wav.
Below is an example, illustrating a response that combines a transcript and audio bytes, using the gpt-4o-audio-preview model:
import { Prompt, UserMessage } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';
const userMessage = new UserMessage({ content: 'Tell me a short joke about Node.js' });
const response = await chatModel.call(
new Prompt(
[userMessage],
OpenAiChatOptions.builder()
.model('gpt-4o-audio-preview')
.outputModalities(['text', 'audio'])
.outputAudio({ voice: 'alloy', format: 'wav' })
.build(),
),
);
const text = response.result?.output.text; // audio transcript
const waveAudio = response.result?.output.media[0]?.dataAsByteArray; // audio data (Buffer)
You have to specify an 'audio' modality in the OpenAiChatOptions to generate audio output.
The outputAudio option takes an OpenAI ChatCompletionAudioParam object with the voice and audio format for the audio output.
Audio output is not supported in streaming mode for the wav format. Use pcm16 if you need streamed audio.
|
Structured Outputs
OpenAI provides custom Structured Outputs APIs that ensure your model generates responses conforming strictly to your provided JSON Schema. In addition to the existing model-agnostic Structured Output Converter, these APIs offer enhanced control and precision.
| Currently, OpenAI supports a subset of the JSON Schema language format. |
Configuration
NestJS AI lets you configure your response format programmatically using the OpenAiChatOptions builder, or globally through the module’s options.responseFormat.
Using the Chat Options Builder
You can set the response format programmatically with the OpenAiChatOptions builder as shown below:
const jsonSchema = {
type: 'object',
properties: {
steps: {
type: 'array',
items: {
type: 'object',
properties: {
explanation: { type: 'string' },
output: { type: 'string' },
},
required: ['explanation', 'output'],
additionalProperties: false,
},
},
final_answer: { type: 'string' },
},
required: ['steps', 'final_answer'],
additionalProperties: false,
} as const;
const prompt = new Prompt(
'how can I solve 8x + 7 = -23',
OpenAiChatOptions.builder()
.model('gpt-4o-mini')
.responseFormat({
type: 'json_schema',
json_schema: { name: 'math_reasoning', strict: true, schema: jsonSchema },
})
.build(),
);
const response = await chatModel.call(prompt);
| Adhere to the OpenAI subset of the JSON Schema language format. |
Integrating with StandardSchemaOutputConverter
You can leverage the existing StandardSchemaOutputConverter utility to drive the schema generation from a Standard Schema (Zod, Valibot, Arktype, etc.) and later convert the structured response into typed instances.
The OpenAI builder provides an outputSchema(jsonSchema) shortcut that automatically wraps the schema in { type: 'json_schema', strict: true, … }:
import { z } from 'zod';
import { Prompt } from '@nestjs-ai/model';
import { StandardSchemaOutputConverter } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';
const MathReasoningSchema = z.object({
steps: z.array(
z.object({
explanation: z.string(),
output: z.string(),
}),
),
final_answer: z.string(),
});
const outputConverter = new StandardSchemaOutputConverter({
schema: MathReasoningSchema,
});
const prompt = new Prompt(
'how can I solve 8x + 7 = -23',
OpenAiChatOptions.builder()
.model('gpt-4o-mini')
.outputSchema(outputConverter.jsonSchema)
.build(),
);
const response = await chatModel.call(prompt);
const content = response.result?.output.text ?? '';
const mathReasoning = await outputConverter.convert(content);
Although marking fields as required is optional for JSON Schema, OpenAI mandates required fields for the structured response to function correctly. With Zod, every property is required by default unless you mark it .optional().
|
Configuring via Module Options
Alternatively, you can configure the desired response format through the module options. The configuration is applied to every request unless overridden:
import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
@Module({
imports: [
NestAiModule.forRoot(),
OpenAiChatModelModule.forFeature({
apiKey: process.env.OPENAI_API_KEY,
options: {
model: 'gpt-4o-mini',
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'MySchemaName',
strict: true,
schema: {
type: 'object',
properties: {
steps: {
type: 'array',
items: {
type: 'object',
properties: {
explanation: { type: 'string' },
output: { type: 'string' },
},
required: ['explanation', 'output'],
additionalProperties: false,
},
},
final_answer: { type: 'string' },
},
required: ['steps', 'final_answer'],
additionalProperties: false,
},
},
},
},
}),
],
})
export class AppModule {}
Sample Controller
Create a new NestJS application and add the @nestjs-ai/platform and @nestjs-ai/model-openai packages to your dependencies.
Configure the chat module in your AppModule to enable the OpenAI chat model:
import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
import { ChatController } from './chat.controller.js';
@Module({
imports: [
NestAiModule.forRoot(),
OpenAiChatModelModule.forFeature({
apiKey: process.env.OPENAI_API_KEY,
options: {
model: 'gpt-4o',
temperature: 0.7,
},
}),
],
controllers: [ChatController],
})
export class AppModule {}
Replace process.env.OPENAI_API_KEY with your OpenAI credentials, or use forFeatureAsync() with ConfigService for safer secret management.
|
This makes a ChatModel available for injection. Here is an example of a simple @Controller class that uses the chat model for text generations and streaming:
import { Controller, Get, Query, Sse } from '@nestjs/common';
import { InjectChatModel } from '@nestjs-ai/platform';
import { Prompt, UserMessage } from '@nestjs-ai/model';
import type { ChatModel, ChatResponse } from '@nestjs-ai/model';
import { map, type Observable } from 'rxjs';
@Controller()
export class ChatController {
constructor(@InjectChatModel() private readonly chatModel: ChatModel) {}
@Get('/ai/generate')
async generate(
@Query('message') message = 'Tell me a joke',
): Promise<{ generation: string | null }> {
const text = await this.chatModel.call(message);
return { generation: text };
}
@Sse('/ai/generateStream')
generateStream(
@Query('message') message = 'Tell me a joke',
): Observable<{ data: ChatResponse }> {
const prompt = new Prompt(new UserMessage({ content: message }));
return this.chatModel.stream(prompt).pipe(map((data) => ({ data })));
}
}
Manual Configuration
The OpenAiChatModel implements the ChatModel and StreamingChatModel abstractions and uses the underlying openai Node.js client to connect to the OpenAI service.
Add the @nestjs-ai/model-openai dependency to your project (along with its peers):
pnpm add @nestjs-ai/model-openai @nestjs-ai/model @nestjs-ai/commons openai
Next, instantiate OpenAiChatModel directly and use it for text generations:
import { firstValueFrom, lastValueFrom, toArray } from 'rxjs';
import { Prompt } from '@nestjs-ai/model';
import { OpenAiChatModel, OpenAiChatOptions } from '@nestjs-ai/model-openai';
const openAiChatOptions = OpenAiChatOptions.builder()
.apiKey(process.env.OPENAI_API_KEY)
.model('gpt-4o-mini')
.temperature(0.4)
.maxTokens(200)
.build();
const chatModel = new OpenAiChatModel({ options: openAiChatOptions });
const response = await chatModel.call(
new Prompt('Generate the names of 5 famous pirates.'),
);
// Or with streaming responses (RxJS Observable)
const responses = await lastValueFrom(
chatModel
.stream(new Prompt('Generate the names of 5 famous pirates.'))
.pipe(toArray()),
);
The OpenAiChatOptions provides the configuration information for the chat requests.
OpenAiChatOptions.builder() is the fluent options-builder for chat config; the underlying OpenAI client is configured from the same options object (apiKey, baseUrl, timeout, maxRetries, customHeaders, fetchOptions).
Low-level OpenAI Client
@nestjs-ai/model-openai re-exports the OpenAI Node.js SDK client type as OpenAiClient. To talk to the OpenAI Chat Completions API directly, use the official openai SDK:
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Sync request
const completion = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello world' }],
temperature: 0.8,
});
// Streaming request (async iterable)
const stream = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello world' }],
temperature: 0.8,
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
Refer to the openai Node.js SDK documentation for further information.
Files API
|
The Files API client (a port of |
API Key Management
NestJS AI exposes API key configuration through the apiKey property on OpenAiChatProperties and OpenAiChatOptions. For most use cases, supplying the key directly via forFeature() (or forFeatureAsync() for dynamic resolution) is sufficient.
Default Configuration
By default, configure the API key through the chat module:
OpenAiChatModelModule.forFeature({
apiKey: process.env.OPENAI_API_KEY,
});
Custom API Key Configuration
Use forFeatureAsync() to retrieve the API key from a secure key store, rotate it dynamically, or implement custom selection logic:
import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
import { SecretsService } from './secrets.service.js';
@Module({
imports: [
NestAiModule.forRoot(),
OpenAiChatModelModule.forFeatureAsync({
imports: [SecretsModule],
inject: [SecretsService],
useFactory: async (secrets: SecretsService) => ({
apiKey: await secrets.getOpenAiKey(),
options: { model: 'gpt-4o' },
}),
}),
],
})
export class AppModule {}
This is useful when you need to:
-
Retrieve the API key from a secure key store
-
Rotate API keys dynamically at module initialization
-
Implement custom API key selection logic (e.g., per-tenant keys)
For request-time key overrides, set apiKey on OpenAiChatOptions and pass it as a runtime option:
const requestOptions = OpenAiChatOptions.builder()
.apiKey(tenantApiKey)
.model('gpt-4o')
.build();
const response = await chatModel.call(new Prompt('Hello', requestOptions));
Using Extra Parameters with OpenAI-Compatible Servers
OpenAI-compatible inference servers like vLLM, Ollama, and others often support additional parameters beyond those defined in OpenAI’s standard API.
For example, these servers may accept parameters such as top_k, repetition_penalty, or other sampling controls that the official OpenAI API does not recognize.
The extraBody option allows you to pass arbitrary parameters to these servers.
Any key-value pairs provided in extraBody are included at the top level of the JSON request, enabling you to leverage server-specific features while using NestJS AI’s OpenAI client.
|
The If you are communicating with the official OpenAI API, you should never populate the Also note that the |
Configuration with Module Options
You can configure extra parameters using the module’s options.extraBody. Each entry becomes a top-level parameter in the request:
OpenAiChatModelModule.forFeature({
baseUrl: 'http://localhost:8000/v1',
options: {
model: 'meta-llama/Llama-3-8B-Instruct',
temperature: 0.7,
extraBody: {
top_k: 50,
repetition_penalty: 1.1,
},
},
});
This configuration would produce a JSON request like:
{
"model": "meta-llama/Llama-3-8B-Instruct",
"temperature": 0.7,
"top_k": 50,
"repetition_penalty": 1.1,
"messages": [...]
}
Runtime Configuration with Builder
You can also specify extra parameters at runtime using the options builder:
const response = await chatModel.call(
new Prompt(
'Tell me a creative story',
OpenAiChatOptions.builder()
.model('meta-llama/Llama-3-8B-Instruct')
.temperature(0.7)
.extraBody({
top_k: 50,
repetition_penalty: 1.1,
frequency_penalty: 0.5,
})
.build(),
),
);
Example: vLLM Server
When running vLLM with a Llama model, you might want to use sampling parameters specific to vLLM:
OpenAiChatModelModule.forFeature({
baseUrl: 'http://localhost:8000/v1',
options: {
model: 'meta-llama/Llama-3-70B-Instruct',
extraBody: {
top_k: 40,
top_p: 0.95,
repetition_penalty: 1.05,
min_p: 0.05,
},
},
});
Refer to the vLLM documentation for a complete list of supported sampling parameters.
Example: Ollama Server
When using Ollama through the OpenAI-compatible endpoint, you can pass Ollama-specific parameters:
const options = OpenAiChatOptions.builder()
.model('llama3.2')
.extraBody({
num_predict: 100,
top_k: 40,
repeat_penalty: 1.1,
})
.build();
const response = await chatModel.call(new Prompt('Generate text', options));
Consult the Ollama API documentation for available parameters.
|
The |
Reasoning Content from Reasoning Models
Some OpenAI-compatible servers that support reasoning models (such as DeepSeek R1, vLLM with reasoning parsers) expose the model’s internal chain of thought via a reasoning_content field in their API responses.
This field contains the step-by-step reasoning process the model used to arrive at its final answer.
|
Mapping the |
|
Important distinction about
Official OpenAI reasoning models hide the chain-of-thought content when using the Chat Completions API.
They only expose |
Example: DeepSeek R1
DeepSeek R1 is a reasoning model that exposes its internal reasoning process. Configure the chat module to point at the DeepSeek API:
OpenAiChatModelModule.forFeature({
apiKey: process.env.DEEPSEEK_API_KEY,
baseUrl: 'https://api.deepseek.com',
options: {
model: 'deepseek-reasoner',
},
});
When you make requests to DeepSeek R1, responses will include both the reasoning content (the model’s thought process) and the final answer.
Refer to the DeepSeek API documentation for more details on reasoning models.
Example: vLLM with Reasoning Parser
vLLM supports reasoning models when configured with a reasoning parser:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--enable-reasoning \
--reasoning-parser deepseek_r1
OpenAiChatModelModule.forFeature({
baseUrl: 'http://localhost:8000/v1',
options: {
model: 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B',
},
});
Consult the vLLM reasoning outputs documentation for supported reasoning models and parsers.
|
The availability of |