Introduction to the OpenAI API¶

In this section we will cover the basics of using the OpenAI API, including:

Chat Completions
Streaming
Vision input

The beauty of the OpenAI API is that is very simple to use.

In your environment you should have a file called .env with the following:

OPENAI_API_KEY="sk-proj-1234567890"

We will give you this key in the workshop. The key will be deactivated after the workshop!

You can then grab the key using python:

In [3]:

Copied!

from openai import OpenAI
import dotenv
import os
from rich import print as rprint # for making fancy outputs

dotenv.load_dotenv()

client = OpenAI()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
from openai import OpenAI
import dotenv
import os
from rich import print as rprint # for making fancy outputs

dotenv.load_dotenv()

client = OpenAI()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Chat Completions¶

Calling a model is simple

In [87]:

Copied!





system_prompt = "You are Matsuo Basho, the great Japanese haiku poet."
user_query = "Can you give me a haiku about a Samurai cat."

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ],
  max_tokens=128
)

print(response.choices[0].message.content)
system_prompt = "You are Matsuo Basho, the great Japanese haiku poet."
user_query = "Can you give me a haiku about a Samurai cat."

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ],
  max_tokens=128
)

print(response.choices[0].message.content)

Silent paws tread soft,  
Moonlight gleams on sharpened claws—  
Honor in each pounce.

Purrfect.

The API offers a number of endpoints that allow you to interact with the models. The main one that we will cover here is the /chat/completions endpoint. This endpoint allows you to interact with the model in a conversational manner.

Only 2 arguments are actually required for this endpoint:

model: str The model to use. For OpenAI, this includes:
- 'gpt-3.5-turbo'
- 'gpt-4'
- 'gpt-4o'
- 'gpt-4o-mini'
- Any fine-tuned versions of these models.
- Many specific versions of the above models.
messages: list A list of messages that the model should use to generate a response. Each entry in the list of messages comes in the form:

{"role": "<role>", "content": "<content>", "name": "<name>"}

Where <role> can take one of the following forms:

'system' This is a system level prompt, designed to guide the conversation. For example:

"You are a customer service bot."

'user' This is direct input from the user. For example:

"How do I reset my password?"

'assistant' This is the response from the model. For example:

"To reset your password, please visit our website and click on the 'Forgot Password' link."

So all of this fed into one message list would look like this:

messages = [
    {"role": "system", "content": "You are a customer service bot."},
    {"role": "user", "content": "How do I reset my password?"},
    {"role": "assistant", "content": "To reset your password, please visit our website and click on the 'Forgot Password' link."}
]

Additional arguments¶

The /chat/completions endpoint also accepts a number of additional arguments that can be used to alter the response. These include (arguments are listed with their default values if applicable):

max_tokens: int The maximum number of tokens to generate in the response. Important to stop the model from generating too much text and racking up a huge bill.
n: int = 1 The number of completions to generate. This is useful when you want to generate multiple completions and select the best one. You'll be charged for the total number of tokens generated across all completions, so be careful with setting this too high.
temperature: float = 1.0 The temperature of the model, ranging from 0.0 to 2. Use low values for deterministic responses, and high values for more creative responses.
top_p: float = 1.0 The probability of sampling from the top p tokens. This is useful for controlling the diversity of the responses. Setting this to a higher values means the model is more likely to sample from a wider range of tokens.
logprobs: bool = False Whether to return the log probabilities of the tokens generated. This is useful when you want to understand how the model is making decisions.
logit_bias: dict A dictionary of logit biases to apply to the tokens. This is useful when you want to guide the model towards generating certain types of responses.
response_format: str The format of the response. We will cover this later...
stream: bool = False Whether to stream the response back to the client. This is useful when you want to get the response in real-time. Nobody likes to sit and wait for a response. Seeing the text generated as and when it is ready is a much better user experience.

For a full list of arguments, check out the OpenAI API documentation.

Available models¶

Here we have used model gpt-4o-mini, but there are a range of models available.

In [88]:

Copied!

for model in client.models.list():
    print(model)
for model in client.models.list():
    print(model)

Model(id='gpt-3.5-turbo', created=1677610602, object='model', owned_by='openai')
Model(id='gpt-3.5-turbo-0125', created=1706048358, object='model', owned_by='system')
Model(id='dall-e-2', created=1698798177, object='model', owned_by='system')
Model(id='gpt-4-1106-preview', created=1698957206, object='model', owned_by='system')
Model(id='tts-1-hd-1106', created=1699053533, object='model', owned_by='system')
Model(id='tts-1-hd', created=1699046015, object='model', owned_by='system')
Model(id='dall-e-3', created=1698785189, object='model', owned_by='system')
Model(id='whisper-1', created=1677532384, object='model', owned_by='openai-internal')
Model(id='text-embedding-3-large', created=1705953180, object='model', owned_by='system')
Model(id='text-embedding-3-small', created=1705948997, object='model', owned_by='system')
Model(id='text-embedding-ada-002', created=1671217299, object='model', owned_by='openai-internal')
Model(id='gpt-4-turbo', created=1712361441, object='model', owned_by='system')
Model(id='gpt-4o-2024-05-13', created=1715368132, object='model', owned_by='system')
Model(id='gpt-4-0125-preview', created=1706037612, object='model', owned_by='system')
Model(id='gpt-4-turbo-2024-04-09', created=1712601677, object='model', owned_by='system')
Model(id='gpt-4-turbo-preview', created=1706037777, object='model', owned_by='system')
Model(id='gpt-3.5-turbo-16k', created=1683758102, object='model', owned_by='openai-internal')
Model(id='gpt-4o', created=1715367049, object='model', owned_by='system')
Model(id='tts-1', created=1681940951, object='model', owned_by='openai-internal')
Model(id='gpt-3.5-turbo-1106', created=1698959748, object='model', owned_by='system')
Model(id='tts-1-1106', created=1699053241, object='model', owned_by='system')
Model(id='gpt-3.5-turbo-instruct-0914', created=1694122472, object='model', owned_by='system')
Model(id='gpt-4', created=1687882411, object='model', owned_by='openai')
Model(id='gpt-4-0613', created=1686588896, object='model', owned_by='openai')
Model(id='gpt-3.5-turbo-instruct', created=1692901427, object='model', owned_by='system')
Model(id='chatgpt-4o-latest', created=1723515131, object='model', owned_by='system')
Model(id='babbage-002', created=1692634615, object='model', owned_by='system')
Model(id='davinci-002', created=1692634301, object='model', owned_by='system')
Model(id='gpt-4o-mini-2024-07-18', created=1721172717, object='model', owned_by='system')
Model(id='gpt-4o-mini', created=1721172741, object='model', owned_by='system')
Model(id='gpt-4o-2024-08-06', created=1722814719, object='model', owned_by='system')

As of writing gpt-4o-2024-08-06 is the current best offering. But we'll stick with gpt-4o-mini, because it is cheaper and still highly capable.

The response object¶

What is the response object?

In [65]:

Copied!

rprint(response)
rprint(response)

ChatCompletion(
    id='chatcmpl-A5yRAL35y156KgXeY6uUBbLNHD4qh',
    choices=[
        Choice(
            finish_reason='stop',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content='Silent pawsteps glide,  \nMoonlight dances on the blade—  \nFeline honor blinds.  ',
                refusal=None,
                role='assistant',
                function_call=None,
                tool_calls=None
            )
        )
    ],
    created=1725987324,
    model='gpt-4o-mini-2024-07-18',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='fp_483d39d857',
    usage=CompletionUsage(completion_tokens=20, prompt_tokens=37, total_tokens=57)
)

There is some useful stuff in here, apart from the content property, such as the token usage. You might notice some other things too, like function_call and tool_calls. These are specific to OpenAI models, and not every model supports function calling or tools, so we won't cover them. We can achieve many of the same effects without them anyway.

Streaming a response¶

Streaming a response is mainly for user experience. It allows the user to see the response as it comes in, rather than waiting for the whole response to come in. For many applications, this might not be necessary.

In [9]:

Copied!





response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ],
  max_tokens=128,
  stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ],
  max_tokens=128,
  stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Silent paws in dusk,  
Moonlit blade gleams in the night—  
Fierce heart, whiskers twitch.

All this really does is create a streaming object, which acts like a generator. We can then print the chunk as it comes in.

Vision input¶

A huge draw of OpenAI models is the ability to input vision data. This is useful for a wide range of applications, including:

Image captioning
Object detection
Face recognition
Image generation

Let's try an example of inputting an image. First we need to look at the image:

plot

Here is the caption from this figure:

Fig. 2 Spatial and temporal self-similarity and correlation in switching activity.

(A) Percolating devices produce complex patterns of switching events that are self-similar in nature. The top panel contains 2400 s of data, with the bottom panels showing segments of the data with 10, 100, and 1000 times greater temporal magnification and with 3, 9, and 27 times greater magnification on the vertical scale (units of G0 = 2e2/h, the quantum of conductance, are used for convenience). The activity patterns appear qualitatively similar on multiple different time scales. (B and E) The probability density function (PDF) for changes in total network conductance, P(ΔG), resulting from switching activity exhibits heavy-tailed probability distributions. (C and F) IEIs follow power law distributions, suggestive of correlations between events. (D and G) Further evidence of temporal correlation between events is given by the autocorrelation function (ACF) of the switching activity (red), which decays as a power law over several decades. When the IEI sequence is shuffled (gray), the correlations between events are destroyed, resulting in a significant increase in slope in the ACF. The data shown in (B) to (D) (sample I) were obtained with our standard (slow) sampling rate, and the data shown in (E) to (G) (sample II) were measured 1000 times faster (see Materials and Methods), providing further evidence for self-similarity.

This figure is taken from Avalanches and criticality in self-organized nanoscale networks, Mallinson et al., 2019.

Now let's use the OpenAI vision model to generate a caption for this figure.

In [92]:

Copied!





prompt = (
    "This figure is a caption from a paper entitled Avalanches and criticality in self-organized nanoscale networks. "
    "Please provide a caption for this figure. "
    "You should describe the figure, grouping the panels where appropriate. "
    "Feel free to make any inferences you need to."

)
prompt = (
    "This figure is a caption from a paper entitled Avalanches and criticality in self-organized nanoscale networks. "
    "Please provide a caption for this figure. "
    "You should describe the figure, grouping the panels where appropriate. "
    "Feel free to make any inferences you need to."

)

The process of calling a vision model is a little more involved, but OpenAI have a convenient tutorial on how to do this.

Essentially we need to first convert the image to a base64 string. We can then pass this to the OpenAI API.

In [95]:

Copied!





import base64
import requests

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "imgs/figure.jpeg"


def get_image_caption(image_path, prompt):
  # Getting the base64 string
  base64_image = encode_image(image_path)

  headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
  }

  payload = {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": prompt
          },
          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }
          }
        ]
      }
    ],
    "max_tokens": 512
  }

  response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

  return response.json()['choices'][0]['message']['content']
import base64
import requests

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "imgs/figure.jpeg"


def get_image_caption(image_path, prompt):
  # Getting the base64 string
  base64_image = encode_image(image_path)

  headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
  }

  payload = {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": prompt
          },
          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }
          }
        ]
      }
    ],
    "max_tokens": 512
  }

  response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

  return response.json()['choices'][0]['message']['content']

In [96]:

Copied!

caption = get_image_caption(image_path, prompt)
print(caption)
caption = get_image_caption(image_path, prompt)
print(caption)

**Figure Caption:**

**Figure X:** Analysis of avalanche dynamics and criticality in self-organized nanoscale networks. 

**(A)** Time series data showing the fluctuations in conductance, \(\Delta G(G_0)\), over varying observation periods (100 s, 10 s, 1 s, 0.1 s). The four panels illustrate distinct behavior and amplitude of fluctuations as time scales decrease. 

**(B) and (E)** Power law distributions, \(P(\Delta G)\), of the amplitude of conductance fluctuations are presented on logarithmic scales, revealing power-law exponents \(\Delta G \approx -2.59\) (B) and \(\Delta G \approx -2.36\) (E). 

**(C) and (F)** Temporal distributions, \(P(t)\), showing the frequency of events over time. The corresponding power law exponents are \(t \approx -1.39\) (C) and \(t \approx -1.30\) (F), indicating scale-invariant behavior.

**(D) and (G)** Characteristic avalanche sizes \(A(t)\) as a function of time, indicating distinct scaling regimes with exponents \(t \approx -0.19\) and \(t \approx -0.23\) for the upper and lower panels, respectively. The gray data in (D) suggests a crossover behavior in larger time scales. 

These results highlight the critical dynamics and self-organized behavior of the nanoscale networks under study.

I mean, I don't know about you, but I think that's incredible. Let's consider what it has done:

Correctly grouped the panels in the same way the real caption did.
Provided information on the observation periods.
Drawn out the important information, such as critical exponents.
Made the link between power law distributions and scale-free behaviour.

However, it has failed to provide information on temporal correlations, and it has not noticed the self-similarity in caption 1.

But this is still quite impressive, and with more information we could potentially get some better captions.