i predicted customer churn rate using amazon sagemaker canvas

2022.12.02

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Overview

Understanding client behavior is a must for all businesses today. Gaining insights into why and how customers buy can aid with revenue growth. However, losing clients (also known as customer churn) is always a risk, and understanding why customers leave may be just as crucial for sustaining revenues and profitability. Machine learning (ML) may help with insights, but until now, you needed ML professionals to construct models to anticipate churn, which might delay firms' insight-driven actions to retain consumers.

In this post, I will show how an ML engineer can build a customer churn ML model with Amazon SageMaker Canvas, no code required. Canvas provides a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.

To accomplish so, we take data from a CSV file including information on customer usage and churn. Canvas is used to complete the following tasks:

  1. Import the churn dataset from Amazon Simple Storage Service (Amazon S3).
  2. Train and build the churn model.
  3. Analyze the model results.
  4. Test predictions against the model.

For our dataset, we will use a dataset from a telecommunications mobile phone carrier. This sample dataset contains 5,000 records, where each record uses 21 attributes to describe the customer profile. The attributes are as follows:

    • State – The US state in which the customer resides, indicated by a two-letter abbreviation; for example, OH or NJ
    • Account Length – The number of days that this account has been active
    • Area Code – The three-digit area code of the customer’s phone number
    • Phone – The remaining seven-digit phone number
    • Int’l Plan – Whether the customer has an international calling plan (yes/no)
    • VMail Plan – Whether the customer has a voice mail feature (yes/no)
    • VMail Message – The average number of voicemail messages per month
    • Day Mins – The total number of calling minutes used during the day
    • Day Calls – The total number of calls placed during the day
    • Day Charge – The billed cost of daytime calls
    • Eve MinsEve CallsEve Charge – The billed cost for evening calls
    • Night MinsNight CallsNight Charge – The billed cost for nighttime calls
    • Intl MinsIntl CallsIntl Charge – The billed cost for international calls
    • CustServ Calls – The number of calls placed to customer service
    • Churn? – Whether the customer left the service (true/false)

The last property, Churn?, is the one we want the ML model to predict. Because the target property is binary, our model predicts that the output will fall into one of two groups (True or False).

Prerequisites

To use Sagemaker Canvas, go through the installation here. It can take a while before the application open.

Create a customer churn model

First, let’s download the churn dataset and review the file to make sure all the data is there. Then complete the following steps:

  1. Sign in to the AWS Management Console, using an account with the appropriate permissions to access Canvas.
  2. Log in to the Canvas console.

This is where we can manage our datasets and create models.

  1. Choose Import.

  1. Choose Upload and select the churn.csv file.
  2. Choose Import data to upload it to Canvas.

The import process takes approximately 10 seconds (this can vary depending on dataset size). When it’s complete, we can see the dataset is in Ready status.

  1. A preview of the dataset appears. Here we can verify that our data is correct.

After we confirm that the imported dataset is ready, we create our model.

  1. Choose New model.

  1. Select the churn.csv dataset and click Select dataset.

Now we configure the build model process.

  1. For Target columns, choose the Churn? column.

For Model type, Canvas automatically recommends the model type, in this case, 2-category prediction also known as binary classification. it is best used for this scenario because there are only two possible prediction values: True or False, so the recommendation Canvas made is chosen.

  1. Select all 21 columns and click Preview model.

This feature makes use of a portion of our dataset and only one modeling pass. The preview model takes about 2 minutes to construct for our use case.

The estimated accuracy of the model as seen above is 95.6%. To the right side of the plane, the column(features) impact is shown.

The phone and State columns have much less impact on our prediction, In this case, the phone number is simply an account number and not useful in forecasting the chance of churn in other accounts, and the customer's state has no influence on our model. Feature reduction can be done on the features and can impact the accuracy of the model.

Let's update the preview again after removing the Phone and State columns.

The model accuracy improved by 0.4%, as displayed in the screenshot below. the preview model has an estimated accuracy of 96% compared to the initial 95.6%, and the most relevant columns are Night Calls, Eve Mins, Night Charge, and Day Mins. This reveals which columns have the most impact on the performance of the model.

Canvas offers two build options:

  • Standard build – Builds the best model from an optimized process powered by AutoML; speed is exchanged for the greatest accuracy
  • Quick build – Builds a model in a fraction of the time compared to a standard build; potential accuracy is exchanged for speed.

I chose the Standard build option because I want to have the very best model and ready to spend additional time waiting for the result.

The build process can take 2–4 hours. During this time, Canvas tests hundreds of candidate pipelines, selecting the best model to present to us.

Evaluate model performance

When the model-building process was completed, the model predicted churn 97.9% of the time. The visual representation of the predictions matched their outcomes on the Scoring tab. This gives us a better understanding of our model.

Canvas divides the dataset into two parts: training and testing. Canvas utilizes the training dataset to create the model. The test set is used to determine how well the method fits with new data.

True Positive (TP) – The number of True results that were correctly predicted as True True Negative (TN) – The number of False results that were correctly predicted as False False Positive (FP) – The number of False results that were wrongly predicted as True False Negative (FN) – The number of True results that were wrongly predicted as False

On the Predict tab, an interactive prediction is in batch or single (real-time) mode. In this example, we changed a few column values and then did a real-time forecast. Canvas displays the forecast outcome as well as the confidence level.

Assume there is a current client with the following usage: Eve Mins is 20 and Eve Charge is 30. A forecast can be made, and the model predicts that this client will churn with a confidence score of 52.86% (True). Decisions can be made on how to provide special discounts in order to keep this client.

Conclusion

Running a single prediction is fantastic for individual what-if analysis, but we also need to execute predictions on a large number of data at the same time. Canvas can conduct batch predictions, allowing you to run predictions at scale.