Convert a Natural Language Query into an SQL Query using OpenAI API

2023.05.08

Introduction

This blog aims to convert a natural language query into an SQL query using OpenAI API. This enables even a non-technical person to query a database, Isn't that interesting?

OpenAI API is a framework that allows developers to use and try the capabilities of GPT-3 and other OpenAI language models. With the OpenAI API, developers can integrate natural language processing and generation into their own use cases of software products, websites, chatbots, and more.

GPT-3 is a cutting-edge language model that underlies OpenAI API. It has 175 billion parameters, using which GPT-3 generates human-like text by predicting the most likely words or phrases to follow a given prompt. Integrated Natural language processing( NLP ) uses it for a wide variety of tasks, including text completion, translation, summarisation, answering questions, and more.

Developers can access a variety of AI models and tools via the OpenAI API, which is accessible through a free trial and as a premium subscription service.

Now let's jump into action to find a way to convert a natural language to a SQL QUERY..!!

1. Create an Account and API Key in OpenAI

Jump here, verify yourself and create an account.

Now on the top-right corner click on Personal -> View API Keys. Next make sure you are on page API Keys under the Users in the left panel.

Now proceed to click on Create new secret key. Give a suitable name, and click on Create a secret key.

Please Copy and Save the key, as it won't display it again.

2. VS Code setup

To execute Python on a Jupyter Notebook you need certain extensions to be installed.

  1. Jupyter

  2. Python

3. Store API key onto the OS environment

By storing the API key on the OS as an environment variable, we keep it safe and secure. It's also easy to access the key on the application by masking the actual value.

Firstly create a file/note with an extension of .ipynb. It refers to Jupyter Notebook files.

Then open a code snippet and run the code in the python environment.

import os

os.environ['openai_api_key'] = '*****secret-key******'

Now the key is stored as an environment variable under the name openai_api_key.

Note: Once you have set the key, you can remove the above code snippet from the file. To fetch the key you can use os.getenv('openai_api_key')

4. Install the OpenAI package and set the key to OpenAI

Now let's proceed to OpenAI installation.

pip install openai

Execute the below code to fetch and set your key to OpenAI.

import os
import openai

openai.api_key =  os.getenv('openai_api_key')

5. Upload a data file, using Pandas.

To get the data onto the framework, we are using pandas. Pandas is an open-source library in Python used for data manipulation and analysis. Below command helps us read the CSV file into data frame as df.

import pandas as pd

df = pd.read_csv("../resources/path/file_name.csv")

As sharing the dataset is not relevant, I shall display the dataset, so you have a picture of what type of data we are dealing with.

6. Setup temp database in RAM

For this we need the sqlalchemy library, please install it.

pip install sqlalchemy
import sqlalchemy
from sqlalchemy import create_engine

temp_db = create_engine('sqlite:///:memory:')
df.to_sql(name="Sales", con=temp_db, if_exists='replace')

Now we create a connection to an SQLite database using the create_engine() function from the SQLAlchemy library.

In Pandas, the DataFrame.to_sql() method is used to write the contents of a DataFrame to a SQL database. Among the parameters name is the given table name, it can be anything that's meaningful. The con parameter specifies the database connection. The if_exists parameter specifies how to handle the case where the table already exists. In this case, we set it to 'replace' to overwrite the table if it already exists.

To close the database connection, we can use the temp_db.dispose() method of the temp_db object.

Now let's just test the SQL table by querying. Let's query the sum of sales for each quarter.

from sqlalchemy import text

with temp_db.connect() as conn:
   result = conn.execute(text("select QTR_ID, SUM(SALES) from Sales group by QTR_ID "))
result.all()

Now our SQL database is set up successfully.

7. Create a prompt as required by OpenAI

I choose to write a different blog for this step as it's an exclusive topic. Check the link for further steps.

8. Call OpenAI API

I choose to write a different blog for this step as it's an exclusive topic. Check the link for further steps.

Thank you, Happy Learning..