How Classmethod created a cashierless store like #AmazonGo 〜 Yokota de Go
One day our super genius CEO, Satoshi Yokota said, "Let's create an Amazon Go."
For those of you who don’t know what "Amazon Go" is, it's a new type of store built with incredible shopping technology launched by Amazon. In the store, one can just pick up products and just walk out without having to physically pay for them, yet sill be charged for the items.
A lot of our staff thought Mr. Yokota simply hit upon a new, random idea again. But some members however became interested and gathered together and launched the project. Eventually, they have made it happen, and they did so in just three weeks. The Amazon Go's simulated system "Yokota de Go" is now a reality.
They say seeing is believing. So first, please watch the following demo video!
This achievement has been mentioned even in the most prominent Japanese digital media outlets. Here is one for example.
The article of Nikkei Digital (Japanese only)
In this study session for retailers, the inspection of some cashier-less stores in the US and China has been reported. Additionally, the attendants could try the experimental system that reproduced Amazon Go. It was Classmethod that has completed to produce the simulated system only in three weeks.
(excerpted and translated from the original article)
Next, I'll describe the technical aspects of the Amazon Go's simulated system, "Yokota de Go."
How the Project Started
When the first physical shop of Amazon Go opened in January 2018 in Seattle WA, some of our team members from our Vancouver office visited and wrote the following reports on our blog.
- Amazon Go 1号店に行ってきました。 (Japanese)
- Amazon Go @ Seattle, WA (English)
Then in May, Satoshi Yokota hosted a trip to visit Amazon Go all the way from Japan and took a few employees along for the ride. He inspected the whole environment and technology with the other members from Classmethod.
- Amazon Go体験ツアーを組んでシアトルに行ってきました。 〜レジ無し店舗体験編〜 (Japanese)
The project was then launched with the following rough diagram.
The members were curious to see how much they could develop in just a month. Later, the architecture was eventually polished and updated as seen below.
Sensor and Data
Our goal was to provide the same experience as Amazon Go. In order to do that, sensors and data collection were essential, but there are tons of different sensors that could have been chosen for the requirements. Initially, we collected static images or movies by using a Web camera and analyzed the uploaded data. In the end however, we decided on the following two:
- ToF sensor (depth sensor)
- weight sensor
ToF Sensors
A ToF sensor recognizes three dimensional space by registering an object's depth. It tracks the customers in the store and detects the customers' motion of them reaching out their hand for products on the shelves.
Here, five ToF sensors including the local network have been set up and the application that analyzes the sensor data is already running. Sensor data that was captured from the five different points tracks customers and sends the AWS IoT event via MQTT, whenever they enter the store, and whenever they reach out their hands for products. This then triggers a Lambda function.
Weight Sensors
For weight sensors, the development board, called ESP32, is attached to the backside of each container, where the products are stored. It periodically sends the weight of the containers to AWS. On the AWS side, the master data of products (e.g. snack A's average weight is 7g.) are managed. Therefore, the system can eventually tell that a customer purchased one of snack A if the weight of its container decreased by 7g, or two of snack A if the weight decreased by 14g. Nonetheless, this was not such simple logic. It was actually quite complex due to many variable elements. For example, the sensor data wasn't always precisely accurate, or the exact timing (timestamp) might end up being slightly different.
Data Transmission From Sensors To AWS
There are two main ways to transmit data from sensors to AWS. One is AWS IoT/MQTT and another is HTTP(s).
If possible, IoT/MQTT would be the ideal choice. The data transmission cost via IoT/MQTT is cheaper and by using the feature of IoT, you can filter the transmitted data without Lambda. If it's difficult to use MQTT on the sensors, you can use HTTP(s). It would be good to set up an API Gateway before Lambda, and then save the data into DynamoDB by Lambda. In case you cannot use HTTPS, you can set up an API Gateway followed by CloudFront.
In our scenarios, weight sensors transfer the data via HTTP and it's processed in CloudFront, API Gateway, Lambda, and DynamoDB. The enormous amount of data that is accumulated in DynamoDB will be automatically deleted in a few days by the setting of TTL.
Purchase
In a cashier-less store, the system needs to acknowledge who purchased which products, and how many. The input data for the acknowledgment are collected by the sensors, but that data, as mentioned above, is not always precisely accurate. So the system should acknowledge them comprehensively by using multiple sensor data.
In this version, we implemented only two kinds of sensors. The ToF sensor sends a purchase event to AWS when detecting a customer's motion of reaching out for products, but at this point, the system cannot confirm whether he/she grabbed the product or not. The purchase event will be matched on AWS with the information from the second type of sensor - weight sensor, and if there is a weight decrease from a product's container, then the system would conclude that the product has been purchased.
A key to processing the multiple sensor data is through their timestamps. However, they don't usually match up perfectly, so a difference of a few seconds should be allowed. For example, when the timestamp of a purchase event is at 14:20.50, the weight sensor detects it at 14:20.51. In our system, the events of a few second difference are recognized as simultaneous events.
Enter, Pay, Then Leave
Customers enter the store using a QR code in a mobile app. Upon checking-in, they are authenticated via AWS Cognito and the API will be triggered on AWS, just like normal mobile applications.
What is unique though to this cashier-less store, is that the authentication is matched with the tracking data collected by ToF sensors. In the store, ToF sensors track the customers, regardless of their checking-in. Therefore, the data itself cannot tell who purchased the products, which have been taken. In order to identify the customer, an event is sent by a ToF sensor when a customer enters the store, and it's matched with the check-in information sent via the mobile app.
Similar processes occur when a customer leaves from a store. Whenever a customer that has been tracked by sensors goes out of the store area, an exit event will be sent to AWS. AWS matches it with the check-in information and the system can assess that the customer has successfully exited.
When you leave from the store, you'll receive a payment report in the mobile app as seen below.
The Development Story
This version was quite simple in the end. While we initially used web cameras for image analytics, we eventually shifted to the local processing of ToF sensors.
By using ToF sensors, the sensor and the local system can both track customers in a store and detect their motion of reaching out for products. These processings are done locally and are called "edge processing."
It was possible to use edge processing for the whole system. For instance, if the tracking system of weight sensors was integrated, then only simple APIs need to be published. However, the more you engage the sensor edge process, the more complicated the edge processing becomes. Additionally, the higher hardware specs should be required for the complicated processing, which would cost more.
On the other hand, it was also possible to upload the static images and videos onto AWS and analyze them by using other AWS services. In this case, the edge processing should be simple, which requires cheaper hardware. While initial costs (i.e. hardware costs) are minimized and the barriers to entry are lowered, running costs are pay-as-you-go and therefore might ultimately increase.
The balance between the edge and the cloud is critical. And one can never know what a good balance is until you actually test them in actual environment.
If more sensors are implemented, the system would be able to acknowledge the purchase event more accurately. We're going to increase the number, or variation of sensors and test them for future development.
Our Classmethod team will continue to work on developing Yokota de Go, and we will keep you updated on our progress!