4 OpenAI models

4.1 Models

The table below lists the different generations of OpenAI GPT models.

Tab 4.1: Comparison of various OpenAI models.

Standard context windows: gpt-3.5 (4K), gpt-4 (8K), gpt-4o (8K). For more info, see GPT models documentation.

Modality refers to the types of inputs that the model can process such as text and images. This highlights the versatility of the model in handling different kinds of data. Cost_input and Cost_output is the cost of input and output in USD per million tokens. Context_max indicates the maximum length of the text (in tokens) that the model can consider. Larger context windows allow the model to take into account more information when generating responses. Intelligence is a relative score of the model’s capabilities. Speed is a relative score of the rate at which output tokens are generated. Mini models are generally faster but lower intelligence. Date_training shows the cutoff date for the data used to train the model. It indicates how up-to-date the model is in terms of general knowledge.

Fig 4.1: Scatterplot of OpenAI models showing intelligence on x-axis and speed on the y-axis. Size denotes the output cost and color denotes context length.

The table below lists the different variants usually available for a model.

Tab 4.2: List of OpenAI model variants

Model	Description
gpt-3.5	Standard model
gpt-3.5-turbo	Optimized for speed & cost, for rapid interactions such as chat
gpt-3.5-16k	Extended context window of 16K tokens
gpt-3.5-instruct	Fine tuned to follow instructions more accurately
gpt-3.5-preview	Experimental early release

The instruct or turbo versions are recommended for chat use.

4.2 Parameters

Let’s explore some of parameters that affect the responses from an OpenAI model.

4.2.1 Temperature

Controls randomness/creativity, Range 1-2
Close to 0: Deterministic and predictable, For factual responses
Default 1: Balanced
Close to 2: More random, diverse, for brain storming, story telling etc.

4.2.2 Top P (Nucleus sampling)

Limits output by selecting top tokens up to cumulative probability P
Range 0-1
Default 1: Broader range of tokens considered, more variance, creative writing
Lower values: Fewer tokens considered, more focused, consistent text

4.2.3 Frequency penalty

Penalizes frequent tokens to reduce repetition in generated text
Range -2 to 2, Default 0
>0: Avoid repeating words or phrases. Output more varied but potentially less cohesive, more creative writing
<0: Repeat words/phrases more often, more coherent but risk of redundancy. Technical writing

4.2.4 Presence penalty

Encourages new concepts by penalizing previously used tokens
Range -2 to 2, Default 0
>0: Likely to introduce new topics making the output diverse and wide-ranging
<0: Less compelled to introduce new topics, maintain focus on a specific subject. Good for detailed explanations, essays, instructions