OpenAI Chat

NestJS AI supports the various AI language models from OpenAI, the company behind ChatGPT, which has been instrumental in sparking interest in AI-driven text generation thanks to its creation of industry-leading text generation models and embeddings.

@nestjs-ai/model-openai uses the official openai Node.js SDK under the hood.

Prerequisites

You will need to create an API with OpenAI to access ChatGPT models.

Create an account at OpenAI signup page and generate the token on the API Keys page.

The OpenAiChatModelModule accepts an apiKey property that should be set to the value of the API Key obtained from openai.com.

You can pass the API key directly to forFeature():

OpenAiChatModelModule.forFeature({
  apiKey: process.env.OPENAI_API_KEY,
});

For enhanced security when handling sensitive information like API keys, use forFeatureAsync() together with @nestjs/config so that the value is resolved at module initialization time:

import { ConfigModule, ConfigService } from '@nestjs/config';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';

OpenAiChatModelModule.forFeatureAsync({
  imports: [ConfigModule.forRoot()],
  inject: [ConfigService],
  useFactory: (config: ConfigService) => ({
    apiKey: config.getOrThrow<string>('OPENAI_API_KEY'),
  }),
});

You can also retrieve the API key programmatically from any secure source:

const apiKey = process.env.OPENAI_API_KEY;

Module Configuration

Installation

Install the platform module together with the OpenAI model package:

pnpm add @nestjs-ai/platform @nestjs-ai/model-openai @nestjs-ai/model @nestjs-ai/commons openai

@nestjs-ai/model and @nestjs-ai/commons are peer dependencies of @nestjs-ai/model-openai. The openai Node.js SDK is also a required peer.

Basic Setup

NestAiModule.forRoot() must be imported before OpenAiChatModelModule.forFeature(). The chat module reads the connection-level properties (apiKey, baseUrl, organizationId, …) and any chat-specific overrides from the options object:

import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';

@Module({
  imports: [
    NestAiModule.forRoot(),
    OpenAiChatModelModule.forFeature({
      apiKey: process.env.OPENAI_API_KEY,
      options: {
        model: 'gpt-4o',
        temperature: 0.7,
      },
    }),
  ],
})
export class AppModule {}

Async Configuration

For dynamic configuration (e.g., loading API keys from environment or a config service):

import { Module } from '@nestjs/common';
import { ConfigModule, ConfigService } from '@nestjs/config';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';

@Module({
  imports: [
    ConfigModule.forRoot(),
    NestAiModule.forRoot(),
    OpenAiChatModelModule.forFeatureAsync({
      inject: [ConfigService],
      useFactory: (config: ConfigService) => ({
        apiKey: config.getOrThrow<string>('OPENAI_API_KEY'),
        options: {
          model: 'gpt-4o',
          temperature: 0.7,
        },
      }),
    }),
  ],
})
export class AppModule {}

Module Options

Connection-level properties (top level of forFeature() input):

Property Type Default Description

Property	Type	Default	Description
`apiKey`	`string`	—	OpenAI API key.
`baseUrl`	`string`	`api.openai.com/v1`	Base URL of the OpenAI-compatible service.
`organizationId`	`string`	—	Organization ID. Useful for users that belong to multiple organizations.
`deploymentName`	`string`	—	Deployment name (Microsoft Foundry).
`microsoftFoundry`	`boolean`	`false`	Set to `true` when targeting Microsoft Foundry / Azure OpenAI.
`gitHubModels`	`boolean`	`false`	Set to `true` when targeting GitHub Models.
`azureADTokenProvider`	`() ⇒ Promise<string>`	—	Token provider for Azure AD authentication.
`model`	`string`	—	Default model name resolved when no `options.model` is set.
`timeout`	`Milliseconds`	`60_000`	Request timeout for the OpenAI client.
`maxRetries`	`number`	`3`	Maximum retries for transient failures.
`customHeaders`	`Record<string, string>`	`{}`	Custom HTTP headers added to every request.
`fetchOptions`	`ClientOptions['fetchOptions']`	—	Underlying `fetch` options forwarded to the OpenAI SDK.
`toolCalling`	`ToolCallingObservationProperties`	—	Tool-calling observation settings (see Tool Calling).

apiKey

string

—

OpenAI API key.

baseUrl

string

api.openai.com/v1

Base URL of the OpenAI-compatible service.

organizationId

string

—

Organization ID. Useful for users that belong to multiple organizations.

deploymentName

string

—

Deployment name (Microsoft Foundry).

microsoftFoundry

boolean

false

Set to true when targeting Microsoft Foundry / Azure OpenAI.

gitHubModels

boolean

false

Set to true when targeting GitHub Models.

azureADTokenProvider

() ⇒ Promise<string>

—

Token provider for Azure AD authentication.

model

string

—

Default model name resolved when no options.model is set.

timeout

Milliseconds

60_000

Request timeout for the OpenAI client.

maxRetries

number

3

Maximum retries for transient failures.

customHeaders

Record<string, string>

{}

Custom HTTP headers added to every request.

fetchOptions

ClientOptions['fetchOptions']

—

Underlying fetch options forwarded to the OpenAI SDK.

toolCalling

ToolCallingObservationProperties

—

Tool-calling observation settings (see Tool Calling).

Chat-specific options (under the options key of forFeature() input):

Property Type Default Description

Property	Type	Default	Description
`model`	`string`	`gpt-5-mini`	Name of the OpenAI chat model. See the models page.
`temperature`	`number`	—	Sampling temperature. Higher values make output more random; lower values make results more focused. Not supported by GPT-5 reasoning models.
`frequencyPenalty`	`number`	—	Number between -2.0 and 2.0. Positive values penalize tokens based on existing frequency.
`logitBias`	`Record<string, number>`	—	Modifies the likelihood of specified tokens appearing in the completion.
`logprobs`	`boolean`	—	Whether to return log probabilities of the output tokens.
`topLogprobs`	`number`	—	Number of most likely tokens to return at each token position. Requires `logprobs: true`.
`maxTokens`	`number`	—	Maximum tokens to generate. Use for non-reasoning models (e.g. gpt-4o). Mutually exclusive with `maxCompletionTokens`.
`maxCompletionTokens`	`number`	—	Upper bound for tokens generated, including reasoning tokens. Required for reasoning models (o1, o3, o4-mini series). Mutually exclusive with `maxTokens`.
`n`	`number`	—	Number of chat completion choices to generate for each input message.
`store`	`boolean`	—	Whether to store the output of this chat completion request.
`metadata`	`Record<string, string>`	—	Developer-defined tags and values used for filtering completions in the chat completion dashboard.
`outputModalities`	`('text' \| 'audio')[]`	—	Output types the model should generate. The `gpt-4o-audio-preview` model can generate audio when set to `['text', 'audio']`. Not supported for streaming.
`outputAudio`	`ChatCompletionAudioParam`	—	Audio parameters for audio output. Required when `outputModalities` includes `'audio'`.
`presencePenalty`	`number`	—	Number between -2.0 and 2.0. Positive values penalize tokens based on whether they appear in the text so far.
`responseFormat`	`ResponseFormatText \| ResponseFormatJSONObject \| ResponseFormatJSONSchema`	—	OpenAI response format. Use `{ type: 'json_object' }` for JSON mode or `{ type: 'json_schema', json_schema: { … } }` for Structured Outputs.
`seed`	`number`	—	If specified, the system makes a best effort to sample deterministically.
`stop`	`string[]`	—	Up to 4 sequences where the API will stop generating further tokens.
`topP`	`number`	—	Nucleus sampling parameter. Generally recommended to alter this or `temperature`, not both.
`toolChoice`	`ChatCompletionToolChoiceOption`	—	Controls which (if any) tool is called by the model.
`user`	`string`	—	Unique identifier representing your end-user. Deprecated; prefer `safetyIdentifier`/`promptCacheKey` on the OpenAI API.
`streamOptions`	`ChatCompletionStreamOptions`	—	Streaming options. Set `{ include_usage: true }` to include token usage in stream chunks.
`parallelToolCalls`	`boolean`	`true`	Whether to enable parallel function calling during tool use.
`reasoningEffort`	`ReasoningEffort`	—	Reasoning effort level for reasoning models. Values: `'minimal'`, `'low'`, `'medium'`, `'high'`.
`verbosity`	`'low' \| 'medium' \| 'high'`	—	Output verbosity for compatible models.
`serviceTier`	`'auto' \| 'default' \| 'flex'`	—	Specifies the processing type used for serving the request.
`extraBody`	`Record<string, unknown>`	—	Additional parameters merged into the top level of the JSON request. Intended for OpenAI-compatible servers (vLLM, Ollama, etc.). See Using Extra Parameters with OpenAI-Compatible Servers.
`toolNames`	`Set<string>`	—	List of tools, identified by their names, to enable for function calling. Tools with those names must exist in the resolver registry.
`toolCallbacks`	`ToolCallback[]`	—	Tool callbacks to register with the chat model.
`internalToolExecutionEnabled`	`boolean`	`true`	If `false`, NestJS AI proxies tool calls to the client instead of executing them internally.
`toolContext`	`Record<string, unknown>`	`{}`	Context map forwarded to tool callbacks.

model

string

gpt-5-mini

Name of the OpenAI chat model. See the models page.

temperature

number

—

Sampling temperature. Higher values make output more random; lower values make results more focused. Not supported by GPT-5 reasoning models.

frequencyPenalty

number

—

Number between -2.0 and 2.0. Positive values penalize tokens based on existing frequency.

logitBias

Record<string, number>

—

Modifies the likelihood of specified tokens appearing in the completion.

logprobs

boolean

—

Whether to return log probabilities of the output tokens.

topLogprobs

number

—

Number of most likely tokens to return at each token position. Requires logprobs: true.

maxTokens

number

—

Maximum tokens to generate. Use for non-reasoning models (e.g. gpt-4o). Mutually exclusive with maxCompletionTokens.

maxCompletionTokens

number

—

Upper bound for tokens generated, including reasoning tokens. Required for reasoning models (o1, o3, o4-mini series). Mutually exclusive with maxTokens.

n

number

—

Number of chat completion choices to generate for each input message.

store

boolean

—

Whether to store the output of this chat completion request.

metadata

Record<string, string>

—

Developer-defined tags and values used for filtering completions in the chat completion dashboard.

outputModalities

('text' | 'audio')[]

—

Output types the model should generate. The gpt-4o-audio-preview model can generate audio when set to ['text', 'audio']. Not supported for streaming.

outputAudio

ChatCompletionAudioParam

—

Audio parameters for audio output. Required when outputModalities includes 'audio'.

presencePenalty

number

—

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they appear in the text so far.

responseFormat

ResponseFormatText | ResponseFormatJSONObject | ResponseFormatJSONSchema

—

OpenAI response format. Use { type: 'json_object' } for JSON mode or { type: 'json_schema', json_schema: { … } } for Structured Outputs.

seed

number

—

If specified, the system makes a best effort to sample deterministically.

stop

string[]

—

Up to 4 sequences where the API will stop generating further tokens.

topP

number

—

Nucleus sampling parameter. Generally recommended to alter this or temperature, not both.

toolChoice

ChatCompletionToolChoiceOption

—

Controls which (if any) tool is called by the model.

user

string

—

Unique identifier representing your end-user. Deprecated; prefer safetyIdentifier/promptCacheKey on the OpenAI API.

streamOptions

ChatCompletionStreamOptions

—

Streaming options. Set { include_usage: true } to include token usage in stream chunks.

parallelToolCalls

boolean

true

Whether to enable parallel function calling during tool use.

reasoningEffort

ReasoningEffort

—

Reasoning effort level for reasoning models. Values: 'minimal', 'low', 'medium', 'high'.

verbosity

'low' | 'medium' | 'high'

—

Output verbosity for compatible models.

serviceTier

'auto' | 'default' | 'flex'

—

Specifies the processing type used for serving the request.

extraBody

Record<string, unknown>

—

Additional parameters merged into the top level of the JSON request. Intended for OpenAI-compatible servers (vLLM, Ollama, etc.). See Using Extra Parameters with OpenAI-Compatible Servers.

toolNames

Set<string>

—

List of tools, identified by their names, to enable for function calling. Tools with those names must exist in the resolver registry.

toolCallbacks

ToolCallback[]

—

Tool callbacks to register with the chat model.

internalToolExecutionEnabled

boolean

true

If false, NestJS AI proxies tool calls to the client instead of executing them internally.

toolContext

Record<string, unknown>

{}

Context map forwarded to tool callbacks.

When using GPT-5 models such as gpt-5, gpt-5-mini, and gpt-5-nano, the temperature parameter is not supported. These models are optimized for reasoning and do not use temperature. Specifying a temperature value will result in an error. In contrast, conversational models like gpt-5-chat do support the temperature parameter.

All options under options can be overridden at runtime by adding request-specific Runtime Options to the Prompt call.

Token Limit Parameters: Model-Specific Usage

OpenAI provides two mutually exclusive parameters for controlling token generation limits:

Parameter Use Case Compatible Models

Parameter	Use Case	Compatible Models
`maxTokens`	Non-reasoning models	gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
`maxCompletionTokens`	Reasoning models	o1, o1-mini, o1-preview, o3, o4-mini series

maxTokens

Non-reasoning models

gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo

maxCompletionTokens

Reasoning models

o1, o1-mini, o1-preview, o3, o4-mini series

These parameters are mutually exclusive. Setting both will result in an API error from OpenAI.

Usage Examples

For non-reasoning models (gpt-4o, gpt-3.5-turbo):

const response = await chatModel.call(
  new Prompt(
    'Explain quantum computing in simple terms.',
    OpenAiChatOptions.builder()
      .model('gpt-4o')
      .maxTokens(150) // Use maxTokens for non-reasoning models
      .build(),
  ),
);

For reasoning models (o1, o3 series):

const response = await chatModel.call(
  new Prompt(
    'Solve this complex math problem step by step: ...',
    OpenAiChatOptions.builder()
      .model('o1-preview')
      .maxCompletionTokens(1000) // Use maxCompletionTokens for reasoning models
      .build(),
  ),
);

Builder Validation: The OpenAiChatOptions builder enforces mutual exclusivity with a "last-set-wins" approach:

// This will automatically clear maxTokens and use maxCompletionTokens
const options = OpenAiChatOptions.builder()
  .maxTokens(100)           // Set first
  .maxCompletionTokens(200) // This clears maxTokens and logs a warning
  .build();

// Result: maxTokens = null, maxCompletionTokens = 200

Runtime Options

The OpenAiChatOptions class provides model configurations such as the model to use, the temperature, the frequency penalty, etc.

On start-up, the default options can be configured with the OpenAiChatModel constructor or the OpenAiChatModelModule.forFeature({ options: {…} }) properties.

At run-time, you can override the default options by adding new, request-specific options to the Prompt call. For example, to override the default model and temperature for a specific request:

const response = await chatModel.call(
  new Prompt(
    'Generate the names of 5 famous pirates.',
    OpenAiChatOptions.builder()
      .model('gpt-4o')
      .temperature(0.4)
      .build(),
  ),
);

In addition to the model-specific OpenAiChatOptions you can use a portable ChatOptions instance, created with ChatOptions.builder().

Function Calling

You can register custom tools with the OpenAiChatModel and have the OpenAI model intelligently choose to output a JSON object containing arguments to call one or many of the registered tools. This is a powerful technique to connect the LLM capabilities with external tools and APIs. Read more about Tool Calling.

Multimodal

Multimodality refers to a model’s ability to simultaneously understand and process information from various sources, including text, images, audio, and other data formats. OpenAI supports text, vision, and audio input modalities.

Vision

OpenAI models that offer vision multimodal support include gpt-4, gpt-4o, and gpt-4o-mini. Refer to the Vision guide for more information.

The OpenAI User Message API can incorporate a list of base64-encoded images or image URLs with the message. NestJS AI’s Message interface facilitates multimodal AI models by introducing the Media type from @nestjs-ai/commons. This type encompasses data and metadata about media attachments in messages, including a MimeType (the MediaFormat enum is available for common types) and the raw media data (a Buffer/Uint8Array, a string, or a URL).

Below is an example using a local image with the gpt-4o model:

import { readFileSync } from 'node:fs';
import { Media, MediaFormat } from '@nestjs-ai/commons';
import { UserMessage } from '@nestjs-ai/model';
import { Prompt } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';

const imageBytes = readFileSync(new URL('./multimodal.test.png', import.meta.url));

const userMessage = new UserMessage({
  content: 'Explain what do you see on this picture?',
  media: [new Media({ mimeType: MediaFormat.IMAGE_PNG, data: imageBytes })],
});

const response = await chatModel.call(
  new Prompt(
    [userMessage],
    OpenAiChatOptions.builder().model('gpt-4o').build(),
  ),
);

GPT_4_VISION_PREVIEW is no longer available to new users. If you are not an existing user, please use the gpt-4o or gpt-4-turbo models. More details here.

Or the image URL equivalent using the gpt-4o model:

const userMessage = new UserMessage({
  content: 'Explain what do you see on this picture?',
  media: [
    new Media({
      mimeType: MediaFormat.IMAGE_PNG,
      data: new URL('https://example.com/multimodal.test.png'),
    }),
  ],
});

const response = await chatModel.call(
  new Prompt(
    [userMessage],
    OpenAiChatOptions.builder().model('gpt-4o').build(),
  ),
);

You can pass multiple images as well — just add more Media entries to the media array.

The example shows a model taking as an input the multimodal.test.png image:

along with the text message "Explain what do you see on this picture?", and generating a response like this:

This is an image of a fruit bowl with a simple design. The bowl is made of metal with curved wire edges that
create an open structure, allowing the fruit to be visible from all angles. Inside the bowl, there are two
yellow bananas resting on top of what appears to be a red apple. The bananas are slightly overripe, as
indicated by the brown spots on their peels. The bowl has a metal ring at the top, likely to serve as a handle
for carrying. The bowl is placed on a flat surface with a neutral-colored background that provides a clear
view of the fruit inside.

Audio

OpenAI models that offer input audio multimodal support include gpt-4o-audio-preview. Refer to the Audio guide for more information.

The OpenAI User Message API can incorporate a list of base64-encoded audio files with the message, attached via the media field of UserMessage. Currently, OpenAI supports only the following media types: audio/mp3 and audio/wav (MediaFormat.AUDIO_MP3 and MediaFormat.AUDIO_WAV).

Below is an example, illustrating the fusion of user text with an audio file using the gpt-4o-audio-preview model:

import { readFileSync } from 'node:fs';
import { Media, MediaFormat } from '@nestjs-ai/commons';
import { Prompt, UserMessage } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';

const audioBytes = readFileSync(new URL('./speech1.mp3', import.meta.url));

const userMessage = new UserMessage({
  content: 'What is this recording about?',
  media: [new Media({ mimeType: MediaFormat.AUDIO_MP3, data: audioBytes })],
});

const response = await chatModel.call(
  new Prompt(
    [userMessage],
    OpenAiChatOptions.builder().model('gpt-4o-audio-preview').build(),
  ),
);

You can pass multiple audio files as well.

Output Audio

OpenAI models that offer output audio multimodal support include gpt-4o-audio-preview. Refer to the Audio guide for more information.

The OpenAI Assistant Message API can return a list of base64-encoded audio files with the message. Currently, OpenAI supports only the following audio types for output: audio/mp3 and audio/wav.

Below is an example, illustrating a response that combines a transcript and audio bytes, using the gpt-4o-audio-preview model:

import { Prompt, UserMessage } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';

const userMessage = new UserMessage({ content: 'Tell me a short joke about Node.js' });

const response = await chatModel.call(
  new Prompt(
    [userMessage],
    OpenAiChatOptions.builder()
      .model('gpt-4o-audio-preview')
      .outputModalities(['text', 'audio'])
      .outputAudio({ voice: 'alloy', format: 'wav' })
      .build(),
  ),
);

const text = response.result?.output.text; // audio transcript
const waveAudio = response.result?.output.media[0]?.dataAsByteArray; // audio data (Buffer)

You have to specify an 'audio' modality in the OpenAiChatOptions to generate audio output. The outputAudio option takes an OpenAI ChatCompletionAudioParam object with the voice and audio format for the audio output.

Audio output is not supported in streaming mode for the wav format. Use pcm16 if you need streamed audio.

Structured Outputs

OpenAI provides custom Structured Outputs APIs that ensure your model generates responses conforming strictly to your provided JSON Schema. In addition to the existing model-agnostic Structured Output Converter, these APIs offer enhanced control and precision.

Currently, OpenAI supports a subset of the JSON Schema language format.

Configuration

NestJS AI lets you configure your response format programmatically using the OpenAiChatOptions builder, or globally through the module’s options.responseFormat.

Using the Chat Options Builder

You can set the response format programmatically with the OpenAiChatOptions builder as shown below:

const jsonSchema = {
  type: 'object',
  properties: {
    steps: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          explanation: { type: 'string' },
          output: { type: 'string' },
        },
        required: ['explanation', 'output'],
        additionalProperties: false,
      },
    },
    final_answer: { type: 'string' },
  },
  required: ['steps', 'final_answer'],
  additionalProperties: false,
} as const;

const prompt = new Prompt(
  'how can I solve 8x + 7 = -23',
  OpenAiChatOptions.builder()
    .model('gpt-4o-mini')
    .responseFormat({
      type: 'json_schema',
      json_schema: { name: 'math_reasoning', strict: true, schema: jsonSchema },
    })
    .build(),
);

const response = await chatModel.call(prompt);

Adhere to the OpenAI subset of the JSON Schema language format.

Integrating with `StandardSchemaOutputConverter`

You can leverage the existing StandardSchemaOutputConverter utility to drive the schema generation from a Standard Schema (Zod, Valibot, Arktype, etc.) and later convert the structured response into typed instances.

The OpenAI builder provides an outputSchema(jsonSchema) shortcut that automatically wraps the schema in { type: 'json_schema', strict: true, … }:

import { z } from 'zod';
import { Prompt } from '@nestjs-ai/model';
import { StandardSchemaOutputConverter } from '@nestjs-ai/model';
import { OpenAiChatOptions } from '@nestjs-ai/model-openai';

const MathReasoningSchema = z.object({
  steps: z.array(
    z.object({
      explanation: z.string(),
      output: z.string(),
    }),
  ),
  final_answer: z.string(),
});

const outputConverter = new StandardSchemaOutputConverter({
  schema: MathReasoningSchema,
});

const prompt = new Prompt(
  'how can I solve 8x + 7 = -23',
  OpenAiChatOptions.builder()
    .model('gpt-4o-mini')
    .outputSchema(outputConverter.jsonSchema)
    .build(),
);

const response = await chatModel.call(prompt);
const content = response.result?.output.text ?? '';

const mathReasoning = await outputConverter.convert(content);

Although marking fields as required is optional for JSON Schema, OpenAI mandates required fields for the structured response to function correctly. With Zod, every property is required by default unless you mark it .optional().

Configuring via Module Options

Alternatively, you can configure the desired response format through the module options. The configuration is applied to every request unless overridden:

import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';

@Module({
  imports: [
    NestAiModule.forRoot(),
    OpenAiChatModelModule.forFeature({
      apiKey: process.env.OPENAI_API_KEY,
      options: {
        model: 'gpt-4o-mini',
        responseFormat: {
          type: 'json_schema',
          json_schema: {
            name: 'MySchemaName',
            strict: true,
            schema: {
              type: 'object',
              properties: {
                steps: {
                  type: 'array',
                  items: {
                    type: 'object',
                    properties: {
                      explanation: { type: 'string' },
                      output: { type: 'string' },
                    },
                    required: ['explanation', 'output'],
                    additionalProperties: false,
                  },
                },
                final_answer: { type: 'string' },
              },
              required: ['steps', 'final_answer'],
              additionalProperties: false,
            },
          },
        },
      },
    }),
  ],
})
export class AppModule {}

Sample Controller

Create a new NestJS application and add the @nestjs-ai/platform and @nestjs-ai/model-openai packages to your dependencies.

Configure the chat module in your AppModule to enable the OpenAI chat model:

import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
import { ChatController } from './chat.controller.js';

@Module({
  imports: [
    NestAiModule.forRoot(),
    OpenAiChatModelModule.forFeature({
      apiKey: process.env.OPENAI_API_KEY,
      options: {
        model: 'gpt-4o',
        temperature: 0.7,
      },
    }),
  ],
  controllers: [ChatController],
})
export class AppModule {}

Replace process.env.OPENAI_API_KEY with your OpenAI credentials, or use forFeatureAsync() with ConfigService for safer secret management.

This makes a ChatModel available for injection. Here is an example of a simple @Controller class that uses the chat model for text generations and streaming:

import { Controller, Get, Query, Sse } from '@nestjs/common';
import { InjectChatModel } from '@nestjs-ai/platform';
import { Prompt, UserMessage } from '@nestjs-ai/model';
import type { ChatModel, ChatResponse } from '@nestjs-ai/model';
import { map, type Observable } from 'rxjs';

@Controller()
export class ChatController {
  constructor(@InjectChatModel() private readonly chatModel: ChatModel) {}

  @Get('/ai/generate')
  async generate(
    @Query('message') message = 'Tell me a joke',
  ): Promise<{ generation: string | null }> {
    const text = await this.chatModel.call(message);
    return { generation: text };
  }

  @Sse('/ai/generateStream')
  generateStream(
    @Query('message') message = 'Tell me a joke',
  ): Observable<{ data: ChatResponse }> {
    const prompt = new Prompt(new UserMessage({ content: message }));
    return this.chatModel.stream(prompt).pipe(map((data) => ({ data })));
  }
}

Manual Configuration

The OpenAiChatModel implements the ChatModel and StreamingChatModel abstractions and uses the underlying openai Node.js client to connect to the OpenAI service.

Add the @nestjs-ai/model-openai dependency to your project (along with its peers):

pnpm add @nestjs-ai/model-openai @nestjs-ai/model @nestjs-ai/commons openai

Next, instantiate OpenAiChatModel directly and use it for text generations:

import { firstValueFrom, lastValueFrom, toArray } from 'rxjs';
import { Prompt } from '@nestjs-ai/model';
import { OpenAiChatModel, OpenAiChatOptions } from '@nestjs-ai/model-openai';

const openAiChatOptions = OpenAiChatOptions.builder()
  .apiKey(process.env.OPENAI_API_KEY)
  .model('gpt-4o-mini')
  .temperature(0.4)
  .maxTokens(200)
  .build();

const chatModel = new OpenAiChatModel({ options: openAiChatOptions });

const response = await chatModel.call(
  new Prompt('Generate the names of 5 famous pirates.'),
);

// Or with streaming responses (RxJS Observable)
const responses = await lastValueFrom(
  chatModel
    .stream(new Prompt('Generate the names of 5 famous pirates.'))
    .pipe(toArray()),
);

The OpenAiChatOptions provides the configuration information for the chat requests. OpenAiChatOptions.builder() is the fluent options-builder for chat config; the underlying OpenAI client is configured from the same options object (apiKey, baseUrl, timeout, maxRetries, customHeaders, fetchOptions).

Low-level OpenAI Client

@nestjs-ai/model-openai re-exports the OpenAI Node.js SDK client type as OpenAiClient. To talk to the OpenAI Chat Completions API directly, use the official openai SDK:

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Sync request
const completion = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello world' }],
  temperature: 0.8,
});

// Streaming request (async iterable)
const stream = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello world' }],
  temperature: 0.8,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Refer to the openai Node.js SDK documentation for further information.

Files API

The Files API client (a port of OpenAiFileApi.java) is not yet available in NestJS AI. Use the official openai Node.js SDK (client.files.*) for file management operations such as uploading, listing, retrieving, deleting files, and accessing file contents.

API Key Management

NestJS AI exposes API key configuration through the apiKey property on OpenAiChatProperties and OpenAiChatOptions. For most use cases, supplying the key directly via forFeature() (or forFeatureAsync() for dynamic resolution) is sufficient.

Default Configuration

By default, configure the API key through the chat module:

OpenAiChatModelModule.forFeature({
  apiKey: process.env.OPENAI_API_KEY,
});

Custom API Key Configuration

Use forFeatureAsync() to retrieve the API key from a secure key store, rotate it dynamically, or implement custom selection logic:

import { Module } from '@nestjs/common';
import { NestAiModule } from '@nestjs-ai/platform';
import { OpenAiChatModelModule } from '@nestjs-ai/model-openai';
import { SecretsService } from './secrets.service.js';

@Module({
  imports: [
    NestAiModule.forRoot(),
    OpenAiChatModelModule.forFeatureAsync({
      imports: [SecretsModule],
      inject: [SecretsService],
      useFactory: async (secrets: SecretsService) => ({
        apiKey: await secrets.getOpenAiKey(),
        options: { model: 'gpt-4o' },
      }),
    }),
  ],
})
export class AppModule {}

This is useful when you need to:

Retrieve the API key from a secure key store
Rotate API keys dynamically at module initialization
Implement custom API key selection logic (e.g., per-tenant keys)

For request-time key overrides, set apiKey on OpenAiChatOptions and pass it as a runtime option:

const requestOptions = OpenAiChatOptions.builder()
  .apiKey(tenantApiKey)
  .model('gpt-4o')
  .build();

const response = await chatModel.call(new Prompt('Hello', requestOptions));

Using Extra Parameters with OpenAI-Compatible Servers

OpenAI-compatible inference servers like vLLM, Ollama, and others often support additional parameters beyond those defined in OpenAI’s standard API. For example, these servers may accept parameters such as top_k, repetition_penalty, or other sampling controls that the official OpenAI API does not recognize.

The extraBody option allows you to pass arbitrary parameters to these servers. Any key-value pairs provided in extraBody are included at the top level of the JSON request, enabling you to leverage server-specific features while using NestJS AI’s OpenAI client.

The extraBody parameter is intended for use with OpenAI-compatible servers, not the official OpenAI API. The official OpenAI API applies strict validation and will return an HTTP 400 error ("Unknown parameter: 'extra_body'") if unrecognized fields are encountered.

If you are communicating with the official OpenAI API, you should never populate the extraBody parameter.

Also note that the extraBody object is intentionally flattened into the top level of the JSON request during serialization. So setting extraBody({ custom_flag: true }) results in { "custom_flag": true } at the root of the JSON payload, matching the behavior of official SDKs.

Configuration with Module Options

You can configure extra parameters using the module’s options.extraBody. Each entry becomes a top-level parameter in the request:

OpenAiChatModelModule.forFeature({
  baseUrl: 'http://localhost:8000/v1',
  options: {
    model: 'meta-llama/Llama-3-8B-Instruct',
    temperature: 0.7,
    extraBody: {
      top_k: 50,
      repetition_penalty: 1.1,
    },
  },
});

This configuration would produce a JSON request like:

{
  "model": "meta-llama/Llama-3-8B-Instruct",
  "temperature": 0.7,
  "top_k": 50,
  "repetition_penalty": 1.1,
  "messages": [...]
}

Runtime Configuration with Builder

You can also specify extra parameters at runtime using the options builder:

const response = await chatModel.call(
  new Prompt(
    'Tell me a creative story',
    OpenAiChatOptions.builder()
      .model('meta-llama/Llama-3-8B-Instruct')
      .temperature(0.7)
      .extraBody({
        top_k: 50,
        repetition_penalty: 1.1,
        frequency_penalty: 0.5,
      })
      .build(),
  ),
);

Example: vLLM Server

When running vLLM with a Llama model, you might want to use sampling parameters specific to vLLM:

OpenAiChatModelModule.forFeature({
  baseUrl: 'http://localhost:8000/v1',
  options: {
    model: 'meta-llama/Llama-3-70B-Instruct',
    extraBody: {
      top_k: 40,
      top_p: 0.95,
      repetition_penalty: 1.05,
      min_p: 0.05,
    },
  },
});

Refer to the vLLM documentation for a complete list of supported sampling parameters.

Example: Ollama Server

When using Ollama through the OpenAI-compatible endpoint, you can pass Ollama-specific parameters:

const options = OpenAiChatOptions.builder()
  .model('llama3.2')
  .extraBody({
    num_predict: 100,
    top_k: 40,
    repeat_penalty: 1.1,
  })
  .build();

const response = await chatModel.call(new Prompt('Generate text', options));

Consult the Ollama API documentation for available parameters.

The extraBody parameter accepts any Record<string, unknown>, allowing you to pass whatever parameters your target server supports. NestJS AI does not validate these parameters — they are passed directly to the server. This design provides maximum flexibility for working with diverse OpenAI-compatible implementations.

Reasoning Content from Reasoning Models

Some OpenAI-compatible servers that support reasoning models (such as DeepSeek R1, vLLM with reasoning parsers) expose the model’s internal chain of thought via a reasoning_content field in their API responses. This field contains the step-by-step reasoning process the model used to arrive at its final answer.

Mapping the reasoning_content field to assistant message metadata is not yet implemented in @nestjs-ai/model-openai. To access reasoning content from compatible servers today, use the official openai Node.js SDK directly and read the reasoning_content field from the raw chat completion response.

Important distinction about reasoning_content availability:

OpenAI-compatible servers (DeepSeek, vLLM): Expose reasoning_content in Chat Completions API responses ✅
Official OpenAI models (GPT-5, o1, o3): Do NOT expose reasoning text in Chat Completions API responses ❌

Official OpenAI reasoning models hide the chain-of-thought content when using the Chat Completions API. They only expose reasoning_tokens count in usage statistics. To access actual reasoning text from official OpenAI models, you must use OpenAI’s Responses API (a separate endpoint not currently supported by this client).

Example: DeepSeek R1

DeepSeek R1 is a reasoning model that exposes its internal reasoning process. Configure the chat module to point at the DeepSeek API:

OpenAiChatModelModule.forFeature({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseUrl: 'https://api.deepseek.com',
  options: {
    model: 'deepseek-reasoner',
  },
});

When you make requests to DeepSeek R1, responses will include both the reasoning content (the model’s thought process) and the final answer.

Refer to the DeepSeek API documentation for more details on reasoning models.

Example: vLLM with Reasoning Parser

vLLM supports reasoning models when configured with a reasoning parser:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
    --enable-reasoning \
    --reasoning-parser deepseek_r1

OpenAiChatModelModule.forFeature({
  baseUrl: 'http://localhost:8000/v1',
  options: {
    model: 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B',
  },
});

Consult the vLLM reasoning outputs documentation for supported reasoning models and parsers.

The availability of reasoning_content depends entirely on the inference server you’re using. Not all OpenAI-compatible servers expose reasoning content, even when using reasoning-capable models. Always refer to your server’s API documentation to understand what fields are available in responses.

OpenAI Chat

Prerequisites

Module Configuration

Installation

Basic Setup

Async Configuration

Module Options

Token Limit Parameters: Model-Specific Usage

Usage Examples

Runtime Options

Function Calling

Multimodal

Vision

Audio

Output Audio

Structured Outputs

Configuration

Using the Chat Options Builder

Integrating with StandardSchemaOutputConverter

Configuring via Module Options

Sample Controller

Manual Configuration

Low-level OpenAI Client

Files API

API Key Management

Default Configuration

Custom API Key Configuration

Using Extra Parameters with OpenAI-Compatible Servers

Configuration with Module Options

Runtime Configuration with Builder

Example: vLLM Server

Example: Ollama Server

Reasoning Content from Reasoning Models

Example: DeepSeek R1

Example: vLLM with Reasoning Parser

Integrating with `StandardSchemaOutputConverter`