The OpenAI API¶
In this section we will cover the basics of using the OpenAI API, including:
- Chat Completions
- Streaming
- Vision input
The beauty of the OpenAI API is that is very simple to use.
In your environment you should have a file called .env
with the following:
OPENAI_API_KEY="sk-proj-1234567890"
We will give you this key in the workshop. The key will be deactivated after the workshop!
You can then grab the key using python:
from openai import OpenAI
import dotenv
import os
from rich import print as rprint # for making fancy outputs
dotenv.load_dotenv()
client = OpenAI()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
Chat Completions¶
Calling a model is simple
system_prompt = "You are Matsuo Basho, the great Japanese haiku poet."
user_query = "Can you give me a haiku about a Samurai cat."
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
],
max_tokens=128
)
print(response.choices[0].message.content)
Silent paws tread soft, Moonlight gleams on sharpened claws— Honor in each pounce.
Purrfect.
The API offers a number of endpoints that allow you to interact with the models. The main one that we will cover here is the /chat/completions
endpoint. This endpoint allows you to interact with the model in a conversational manner.
Only 2 arguments are actually required for this endpoint:
model: str
The model to use. For OpenAI, this includes:'gpt-3.5-turbo'
'gpt-4'
'gpt-4o'
'gpt-4o-mini'
Any fine-tuned versions of these models.
Many specific versions of the above models.
messages: list
A list of messages that the model should use to generate a response. Each entry in the list of messages comes in the form:
{"role": "<role>", "content": "<content>", "name": "<name>"}
Where <role>
can take one of the following forms:
'system'
This is a system level prompt, designed to guide the conversation. For example:
"You are a customer service bot."
'user'
This is direct input from the user. For example:
"How do I reset my password?"
'assistant'
This is the response from the model. For example:
"To reset your password, please visit our website and click on the 'Forgot Password' link."
So all of this fed into one message list would look like this:
messages = [
{"role": "system", "content": "You are a customer service bot."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "To reset your password, please visit our website and click on the 'Forgot Password' link."}
]
Additional arguments¶
The /chat/completions
endpoint also accepts a number of additional arguments that can be used to alter the response. These include (arguments are listed with their default values if applicable):
max_tokens: int
The maximum number of tokens to generate in the response. Important to stop the model from generating too much text and racking up a huge bill.n: int = 1
The number of completions to generate. This is useful when you want to generate multiple completions and select the best one. You'll be charged for the total number of tokens generated across all completions, so be careful with setting this too high.temperature: float = 1.0
The temperature of the model, ranging from 0.0 to 2. Use low values for deterministic responses, and high values for more creative responses.top_p: float = 1.0
The probability of sampling from the topp
tokens. This is useful for controlling the diversity of the responses. Setting this to a higher values means the model is more likely to sample from a wider range of tokens.logprobs: bool = False
Whether to return the log probabilities of the tokens generated. This is useful when you want to understand how the model is making decisions.logit_bias: dict
A dictionary of logit biases to apply to the tokens. This is useful when you want to guide the model towards generating certain types of responses.response_format: str
The format of the response. We will cover this later...stream: bool = False
Whether to stream the response back to the client. This is useful when you want to get the response in real-time. Nobody likes to sit and wait for a response. Seeing the text generated as and when it is ready is a much better user experience.
For a full list of arguments, check out the OpenAI API documentation.
Available models¶
Here we have used model gpt-4o-mini
, but there are a range of models available.
for model in client.models.list():
print(model)
Model(id='gpt-3.5-turbo', created=1677610602, object='model', owned_by='openai') Model(id='gpt-3.5-turbo-0125', created=1706048358, object='model', owned_by='system') Model(id='dall-e-2', created=1698798177, object='model', owned_by='system') Model(id='gpt-4-1106-preview', created=1698957206, object='model', owned_by='system') Model(id='tts-1-hd-1106', created=1699053533, object='model', owned_by='system') Model(id='tts-1-hd', created=1699046015, object='model', owned_by='system') Model(id='dall-e-3', created=1698785189, object='model', owned_by='system') Model(id='whisper-1', created=1677532384, object='model', owned_by='openai-internal') Model(id='text-embedding-3-large', created=1705953180, object='model', owned_by='system') Model(id='text-embedding-3-small', created=1705948997, object='model', owned_by='system') Model(id='text-embedding-ada-002', created=1671217299, object='model', owned_by='openai-internal') Model(id='gpt-4-turbo', created=1712361441, object='model', owned_by='system') Model(id='gpt-4o-2024-05-13', created=1715368132, object='model', owned_by='system') Model(id='gpt-4-0125-preview', created=1706037612, object='model', owned_by='system') Model(id='gpt-4-turbo-2024-04-09', created=1712601677, object='model', owned_by='system') Model(id='gpt-4-turbo-preview', created=1706037777, object='model', owned_by='system') Model(id='gpt-3.5-turbo-16k', created=1683758102, object='model', owned_by='openai-internal') Model(id='gpt-4o', created=1715367049, object='model', owned_by='system') Model(id='tts-1', created=1681940951, object='model', owned_by='openai-internal') Model(id='gpt-3.5-turbo-1106', created=1698959748, object='model', owned_by='system') Model(id='tts-1-1106', created=1699053241, object='model', owned_by='system') Model(id='gpt-3.5-turbo-instruct-0914', created=1694122472, object='model', owned_by='system') Model(id='gpt-4', created=1687882411, object='model', owned_by='openai') Model(id='gpt-4-0613', created=1686588896, object='model', owned_by='openai') Model(id='gpt-3.5-turbo-instruct', created=1692901427, object='model', owned_by='system') Model(id='chatgpt-4o-latest', created=1723515131, object='model', owned_by='system') Model(id='babbage-002', created=1692634615, object='model', owned_by='system') Model(id='davinci-002', created=1692634301, object='model', owned_by='system') Model(id='gpt-4o-mini-2024-07-18', created=1721172717, object='model', owned_by='system') Model(id='gpt-4o-mini', created=1721172741, object='model', owned_by='system') Model(id='gpt-4o-2024-08-06', created=1722814719, object='model', owned_by='system')
As of writing gpt-4o-2024-08-06
is the current best offering. But we'll stick with gpt-4o-mini
, because it is cheaper and still highly capable.
The response object¶
What is the response
object?
rprint(response)
ChatCompletion( id='chatcmpl-A5yRAL35y156KgXeY6uUBbLNHD4qh', choices=[ Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content='Silent pawsteps glide, \nMoonlight dances on the blade— \nFeline honor blinds. ', refusal=None, role='assistant', function_call=None, tool_calls=None ) ) ], created=1725987324, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier=None, system_fingerprint='fp_483d39d857', usage=CompletionUsage(completion_tokens=20, prompt_tokens=37, total_tokens=57) )
There is some useful stuff in here, apart from the content
property, such as the token usage. You might notice some other things too, like function_call
and tool_calls
. These are specific to OpenAI models, and not every model supports function calling or tools, so we won't cover them. We can achieve many of the same effects without them anyway.
Streaming a response¶
Streaming a response is mainly for user experience. It allows the user to see the response as it comes in, rather than waiting for the whole response to come in. For many applications, this might not be necessary.
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
],
max_tokens=128,
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Silent paws in dusk, Moonlit blade gleams in the night— Fierce heart, whiskers twitch.
All this really does is create a streaming object, which acts like a generator. We can then print the chunk as it comes in.
Vision input¶
A huge draw of OpenAI models is the ability to input vision data. This is useful for a wide range of applications, including:
- Image captioning
- Object detection
- Face recognition
- Image generation
Let's try an example of inputting an image. First we need to look at the image:
Here is the caption from this figure:
Fig. 2 Spatial and temporal self-similarity and correlation in switching activity. (A) Percolating devices produce complex patterns of switching events that are self-similar in nature. The top panel contains 2400 s of data, with the bottom panels showing segments of the data with 10, 100, and 1000 times greater temporal magnification and with 3, 9, and 27 times greater magnification on the vertical scale (units of G0 = 2e2/h, the quantum of conductance, are used for convenience). The activity patterns appear qualitatively similar on multiple different time scales. (B and E) The probability density function (PDF) for changes in total network conductance, P(ΔG), resulting from switching activity exhibits heavy-tailed probability distributions. (C and F) IEIs follow power law distributions, suggestive of correlations between events. (D and G) Further evidence of temporal correlation between events is given by the autocorrelation function (ACF) of the switching activity (red), which decays as a power law over several decades. When the IEI sequence is shuffled (gray), the correlations between events are destroyed, resulting in a significant increase in slope in the ACF. The data shown in (B) to (D) (sample I) were obtained with our standard (slow) sampling rate, and the data shown in (E) to (G) (sample II) were measured 1000 times faster (see Materials and Methods), providing further evidence for self-similarity. |
This figure is taken from Avalanches and criticality in self-organized nanoscale networks, Mallinson et al., 2019.
Now let's use the OpenAI vision model to generate a caption for this figure.
prompt = (
"This figure is a caption from a paper entitled Avalanches and criticality in self-organized nanoscale networks. "
"Please provide a caption for this figure. "
"You should describe the figure, grouping the panels where appropriate. "
"Feel free to make any inferences you need to."
)
The process of calling a vision model is a little more involved, but OpenAI have a convenient tutorial on how to do this.
Essentially we need to first convert the image to a base64 string. We can then pass this to the OpenAI API.
import base64
import requests
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "imgs/figure.jpeg"
def get_image_caption(image_path, prompt):
# Getting the base64 string
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {OPENAI_API_KEY}"
}
payload = {
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 512
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
return response.json()['choices'][0]['message']['content']
caption = get_image_caption(image_path, prompt)
print(caption)
**Figure Caption:** **Figure X:** Analysis of avalanche dynamics and criticality in self-organized nanoscale networks. **(A)** Time series data showing the fluctuations in conductance, \(\Delta G(G_0)\), over varying observation periods (100 s, 10 s, 1 s, 0.1 s). The four panels illustrate distinct behavior and amplitude of fluctuations as time scales decrease. **(B) and (E)** Power law distributions, \(P(\Delta G)\), of the amplitude of conductance fluctuations are presented on logarithmic scales, revealing power-law exponents \(\Delta G \approx -2.59\) (B) and \(\Delta G \approx -2.36\) (E). **(C) and (F)** Temporal distributions, \(P(t)\), showing the frequency of events over time. The corresponding power law exponents are \(t \approx -1.39\) (C) and \(t \approx -1.30\) (F), indicating scale-invariant behavior. **(D) and (G)** Characteristic avalanche sizes \(A(t)\) as a function of time, indicating distinct scaling regimes with exponents \(t \approx -0.19\) and \(t \approx -0.23\) for the upper and lower panels, respectively. The gray data in (D) suggests a crossover behavior in larger time scales. These results highlight the critical dynamics and self-organized behavior of the nanoscale networks under study.
I mean, I don't know about you, but I think that's incredible. Let's consider what it has done:
- Correctly grouped the panels in the same way the real caption did.
- Provided information on the observation periods.
- Drawn out the important information, such as critical exponents.
- Made the link between power law distributions and scale-free behaviour.
However, it has failed to provide information on temporal correlations, and it has not noticed the self-similarity in caption 1.
But this is still quite impressive, and with more information we could potentially get some better captions.
Tools¶
We can also give the model some tools to use - these are essentially just functions that we can call, and the role of the LLM is to generate the arguments to that function.
Here is a simple example to do some maths.
system_prompt = "You are a helpful mathematician. You will only solve the problems given to you. Do not provide any additional information. Provide only the answer."
user_query = "What is 1056 * 1316?"
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
],
max_tokens=256,
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
1382976
1056 * 1316
1389696
So the LLM is not correct :(.
To endow the model with "tool use", we add a function:
def multiply(a: float, b: float) -> float:
return a * b
And then provide the model with a schema, which is just a description of the function in dictionary form (or JSON):
tool_schema = {
"type": "function",
"function": {
"name": "multiply",
"description": "Given two floats, a and b, return the product of a and b.",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The first number to multiply."
},
"b": {
"type": "number",
"description": "The second number to multiply."
}
},
"required": ["a", "b"],
"additionalProperties": False,
}
}
}
tools = [
tool_schema
]
When we make function calls with an LLM, we have to let it know that it has access to one or more tools. We do this by passing in the tools
argument.
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
],
max_tokens=256,
tools=tools,
)
print(response.choices[0].message.tool_calls[0])
ChatCompletionMessageToolCall(id='call_1vXSH7jPzjHCLYBlccyzewuB', function=Function(arguments='{"a":1056,"b":1316}', name='multiply'), type='function')
So now in our response
, we have this extra part called tool_calls
that we can extract information from - in this case the arguments to the multiply
function.
Note that you could achieve a similar result with appropriate prompting - e.g. "Extract only the arguments to a function that multiplies two numbers."
We unpack the actual arguments as a dictionary:
import json
tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call.function.arguments)
print(arguments)
{'a': 1056, 'b': 1316}
And we now feed the arguments into our multiply
function:
result = multiply(**arguments)
print(result)
1389696
So now we have the answer. We can either just return this, or we can feed it back into the LLM. We need to provide the model with the tool_calls[0].id
, so that it can associate response messages of the tool type with the correct tool call.
tool_call_result = {
"role": "tool",
"content": json.dumps({
"a" : arguments["a"],
"b" : arguments["b"],
"result": result
}),
"tool_call_id": response.choices[0].message.tool_calls[0].id
}
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
response.choices[0].message,
tool_call_result
],
max_tokens=56
)
response.choices[0].message.content
'1389696'
This is quite a lot of work to multiply two numbers, but of course the power comes when doing more complex tasks.
And this brings to light an interesting contrast. People talk a lot about "agents" and "tools" and "systems", and when we interact with ChatGPT, we get a single coherent experience. Sometimes it is difficult to distinguish between what the LLM is doing, and what the software engineers have built around it in order to create this seemless experience.