Create a Prompt for OpenAI’s Text Completion API

2023.05.08

Introduction

This blog aims at building a natural language prompt for OpenAI's Text Completion API to generate a SQL query. It's important to design the prompt with clear guidance to the model on what type of SQL query to be generated. Better the prompt, the better the results.

For the OpenAI's Text Completion API to generate a SQL query, we need to provide the below factors clearly:

  • Table name, column names, and natural language query to perform some action.

In fact, there is no formula for the query prompt, but there are just criteria to include all important data. Now let's slit our task for easy reference:

  1. Create a table definition that includes table and column names.
  2. Create an input field to take user input(Natural language query).
  3. Combine the 1,2 to make a single prompt to send to API as its parameter.

Create Table definition

Here we are creating a function named create_table_definition with data frame (df) as its parameter. We start with triple quotes (""") which are used to define multi-line string literals, also known as docstrings. Docstrings are used to document Python code, including functions, classes, and modules. Inside docstrings, we specify some comments to the API that we are sending a sqlite SQL table and the name of the table which is SALES, and we are including the names of the columns as the table schema.

Note: This blog being a continuation blog, parameters, and tables mentioned here are described in detail in the previous blogs, please refer. For reference, I will just add relevant heads-up here:

  • df - dataframe of CSV file. It's some SALES-related data with few columns.
  • SALES - df is modified into a SQL table named SALES.
def create_table_definition(df):
   prompt = """### sqlite SQL table, with its properties:
   #
   # SALES({})
   #
   """.format(",".join(str(col) for col in df.columns))
   return prompt

Here's the output for the above code:

Create a function to take user input

Here we have a prompt_input function to take user input.

def prompt_input():
   nlp_text= input("Enter the query: ")
   return nlp_text

Here's the output for the above code:

Combine the above two functions.

Here combine_prompts function takes two parameters: df and query_prompts. - definition has a table name and its attributes, - query_string_with_select has "### A Query to answer:", query typed by the user and a \nSELECT.

Note: \nSELECT this SELECT in a new line is optional, but you see OpenAI's Text Completion API is basically a text/phrase completion. It returns text/phrase with high probability distribution across the vocabulary vector. Let's just assume we are suggesting the API to respond to something starting off with SELECT. As the model gets more powerful, fewer are the efforts required by us as users. but for now, let's write our prompt the best way possible.

At the end, we are returning a concatenated string of definition and query_string_with_select.

def combine_prompts(df, query_prompts):
   definition = create_table_definition(df)
   query_string_with_select = f"### A Query to answer: {query_prompts}\nSELECT"  # select... is simple a text completion
   return definition + query_string_with_select

Lets test

To test, send all required parameters. Execute the below code:

nlp_text = prompt_input() #take input
combine_prompts(df, nlp_text)

I entered a user input as return the sum of SALES per POSTALCODE, For which the output is:

Thank you, happy learning.