Create a Prompt for OpenAI’s Text Completion API
Introduction
This blog aims at building a natural language prompt for OpenAI's Text Completion API to generate a SQL query. It's important to design the prompt with clear guidance to the model on what type of SQL query to be generated. Better the prompt, the better the results.
For the OpenAI's Text Completion API to generate a SQL query, we need to provide the below factors clearly:
- Table name, column names, and natural language query to perform some action.
In fact, there is no formula for the query prompt, but there are just criteria to include all important data. Now let's slit our task for easy reference:
- Create a table definition that includes table and column names.
- Create an input field to take user input(Natural language query).
- Combine the 1,2 to make a single prompt to send to API as its parameter.
Create Table definition
Here we are creating a function named create_table_definition
with data frame (df) as its parameter. We start with triple quotes (""") which are used to define multi-line string literals, also known as docstrings. Docstrings are used to document Python code, including functions, classes, and modules.
Inside docstrings, we specify some comments to the API that we are sending a sqlite SQL table and the name of the table which is SALES
, and we are including the names of the columns as the table schema.
Note: This blog being a continuation blog, parameters, and tables mentioned here are described in detail in the previous blogs, please refer. For reference, I will just add relevant heads-up here:
df
- dataframe of CSV file. It's some SALES-related data with few columns.SALES
- df is modified into a SQL table named SALES.
def create_table_definition(df): prompt = """### sqlite SQL table, with its properties: # # SALES({}) # """.format(",".join(str(col) for col in df.columns)) return prompt
Here's the output for the above code:
Create a function to take user input
Here we have a prompt_input
function to take user input.
def prompt_input(): nlp_text= input("Enter the query: ") return nlp_text
Here's the output for the above code:
Combine the above two functions.
Here combine_prompts
function takes two parameters: df and query_prompts.
- definition
has a table name and its attributes,
- query_string_with_select
has "### A Query to answer:", query typed by the user and a \nSELECT
.
Note: \nSELECT
this SELECT in a new line is optional, but you see OpenAI's Text Completion API is basically a text/phrase completion. It returns text/phrase with high probability distribution across the vocabulary vector. Let's just assume we are suggesting the API to respond to something starting off with SELECT. As the model gets more powerful, fewer are the efforts required by us as users. but for now, let's write our prompt the best way possible.
At the end, we are returning a concatenated string of definition
and query_string_with_select
.
def combine_prompts(df, query_prompts): definition = create_table_definition(df) query_string_with_select = f"### A Query to answer: {query_prompts}\nSELECT" # select... is simple a text completion return definition + query_string_with_select
Lets test
To test, send all required parameters. Execute the below code:
nlp_text = prompt_input() #take input combine_prompts(df, nlp_text)
I entered a user input as return the sum of SALES per POSTALCODE
,
For which the output is:
Thank you, happy learning.