[MediaPipe] How to install MediaPipe on GPU instance in Amazon EC2.

2020.07.01

This page describes how to run MediaPipe with GPU on Cloud.

In conclusion, you can do it as MediaPipe's official tutorial describes, but you have to modify one point in the program.

NOTE: the version of MediaPipe I used was v0.7.5.

My past work about MediaPipe is on the link below (in Japanese).

【MediaPipe】投稿記事まとめ

Preperation

Step 1: Launch EC2 instance

Launch an instance as usual. The settings are as follows.

  • On the AMI setting, select "UDeep Learning Base AMI (Ubuntu 18.04) Version 25.0-ami-0b5dcdb9ddb35481a". (Remarks: If you select normal Ubuntu 18.04, the same error as before the correction was issued in "Execute Multi Hand Tracking (success ver)" below, and I could not proceed further)
  • Select "p2.xlarge" as the instance type. It is easy to find by selecting "GPU instance" from the above filter conditions. Click Confirm and Create. (It's not confirmed, but I think it's possible with GPU instance types like g2.2xlarge etc.)
  • (Modify other settings as you want.)
  • At startup, select whether to create a new key pair or use an existing key pair.

After launching and initializing the instance, restart it once from the EC2 page of Amazon Management Console (of course with other methods like AWS CLI).

Step 2: Connect to your EC2 instance

Use SSH to access the "public DNS (IPv4)" displayed on the EC2 page when you select the instance.

The command is as follows. Change the path to the key pair file to your file (one downloaded if newly created in step 1), and the access destination address to the access destination instance address. (Thiis can be executed from PowerShell on Windows)

ssh -i "C:\path\to\key.pem" ubuntu@ec2-000-000-000-000.ap-northeast-1.compute.amazonaws.com

Step 3: Install MediaPipe

Install it according to the official guide.

Installation

Step 3-0: Update Ubuntu

After connecting with SSH, execute the following command. Select "keep local version..." on the setting screen that appears in the middle.

sudo apt update
sudo apt upgrade -y

From the EC2 page, restart the instance. Please connect again with SSH.

Step 3-1: Clone the MediaPipe repository

git clone https://github.com/google/mediapipe.git
cd mediapipe

Step 3-2: Install bazel

Install according to the bazel page linked in the installation guide. This time I used the second method, "Using the binary installer".

The version of bazel I used this time was 3.3.0, so the command was as below, but I recommend checking the latest version.

sudo apt install g++ unzip zip -y

sudo apt-get install openjdk-11-jdk -y

wget https://github.com/bazelbuild/bazel/releases/download/3.3.0/bazel-3.3.0-installer-linux-x86_64.sh
chmod +x bazel-3.3.0-installer-linux-x86_64.sh
./bazel-3.3.0-installer-linux-x86_64.sh --user
rm ./bazel-3.3.0-installer-linux-x86_64.sh 

export PATH="$PATH:$HOME/bin"

Step 3-3: Install OpenCV and FFmpeg

This time, I used Option 2 of the installation guide. Execute the command below in the cloned MediaPipe folder.

chmod +x setup_opencv.sh
./setup_opencv.sh

(It takes tens of minutes)

Step 3-4: Install EGL driver for GPU

sudo apt-get install mesa-common-dev libegl1-mesa-dev libgles2-mesa-dev -y

Check installation with Hello World

Run Hello World of MediaPipe with GPU build option.

export GLOG_logtostderr=1

bazel run --copt -DMESA_EGL_NO_X11_HEADERS --copt -DEGL_NO_X11 \
    mediapipe/examples/desktop/hello_world:hello_world

After building bazel (it takes a few minutes), it is OK if "Hello World!" is displayed 10 times as shown below.

I20200629 11:55:45.702250 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702352 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702378 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702397 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702419 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702440 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702461 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702484 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702507 28131 hello_world.cc:56] Hello World!
I20200629 11:55:45.702527 28131 hello_world.cc:56] Hello World!

Execute Multi Hand Tracking

Upload video file

You have to transfer a video file because the camera cannot be connected to the instance in the cloud. You can upload with the scp command as follows. (This command below should be executed in a new window other than the Shell window that has been used up to here)

Change the key pair file path, video file, instance address, and transfer destination file path as appropriate.

scp -i "C:\path\to\key.pem" -r .\video.mp4 ubuntu@ec2-000-000-000-000.ap-northeast-1.compute.amazonaws.com:/home/ubuntu/mediapipe

Execute Multi Hand Tracking (Error)

Build the GPU version of Multi Hand Tracking (it takes tens of minutes) and run it. If you change the transfer destination file path, change it appropriately.

bazel build -c opt --copt -DMESA_EGL_NO_X11_HEADERS --copt -DEGL_NO_X11 \
  mediapipe/examples/desktop/multi_hand_tracking:multi_hand_tracking_gpu

export GLOG_logtostderr=1
bazel-bin/mediapipe/examples/desktop/multi_hand_tracking/multi_hand_tracking_gpu \
  --calculator_graph_config_file=mediapipe/graphs/hand_tracking/multi_hand_tracking_desktop_live.pbtxt \
  --input_video_path="video.mp4" \
  --output_video_path="video_.mp4"

When executed, the following error is output.

I20200629 13:42:14.124982 28141 demo_run_graph_main_gpu.cc:57] Initialize the calculator graph.
I20200629 13:42:14.127943 28141 demo_run_graph_main_gpu.cc:61] Initialize the GPU.
I20200629 13:42:16.027736 28141 gl_context_egl.cc:158] Successfully initialized EGL. Major : 1 Minor: 5
W20200629 13:42:16.027791 28141 gl_context_egl.cc:163] Creating a context with OpenGL ES 3 failed: UNKNOWN: ; eglChooseConfig() returned no matching EGL configuration for RGBA8888 D16 ES3 request.
W20200629 13:42:16.027810 28141 gl_context_egl.cc:164] Fall back on OpenGL ES 2.
E20200629 13:42:16.031584 28141 demo_run_graph_main_gpu.cc:186] Failed to run the graph: ; eglChooseConfig() returned no matching EGL configuration for RGBA8888 D16 ES2 request.

Modify the program

When I searched for a solution of the error, I found the following issue.

Examples using tf-lite with gpu (face_detection_gpu) does not compile on jetson nano · Issue #305 · google/mediapipe

In this issue, it is commented that the same error as above was solved after the display was connected to the device. It seems that the cause of the error occurs because OpenGL can not get the context when the display is not connected.

Examples using tf-lite with gpu (face_detection_gpu) does not compile on jetson nano · Issue #305 · google/mediapipe

The solution is written in the comment below.

Examples using tf-lite with gpu (face_detection_gpu) does not compile on jetson nano · Issue #305 · google/mediapipe

Based on the above comments, modify the program as follows. The file to modify is mediapipe/gpu/gl_context_egl.cc, line 97. Open it (with vim, nano, etc.) and modify it as follows. (Delete "| EGL_WINDOW_BIT")

Before

const EGLint config_attr[] = {
      ...
      EGL_SURFACE_TYPE, EGL_PBUFFER_BIT | EGL_WINDOW_BIT,
      ...
  };

After

const EGLint config_attr[] = {
      ...
      EGL_SURFACE_TYPE, EGL_PBUFFER_BIT,
      ...
  };

Execute Multi Hand Tracking (success)

Build again and run.

bazel build -c opt --copt -DMESA_EGL_NO_X11_HEADERS --copt -DEGL_NO_X11 \
  mediapipe/examples/desktop/multi_hand_tracking:multi_hand_tracking_gpu

export GLOG_logtostderr=1
bazel-bin/mediapipe/examples/desktop/multi_hand_tracking/multi_hand_tracking_gpu \
  --calculator_graph_config_file=mediapipe/graphs/hand_tracking/multi_hand_tracking_desktop_live.pbtxt \
  --input_video_path="video.mp4" \
  --output_video_path="video_gpu.mp4"

If the result is displayed as below, it is successful. At the end, "Segmentation fault" is displayed, but the output of the file itself was completed. It's after Shutting down, so it doesn't seem to affect that much. To fix it, I need to read a little more in the program, so I will leave it as a future work.

(... content of graph ...)
I20200629 13:44:36.920063 28508 demo_run_graph_main_gpu.cc:57] Initialize the calculator graph.
I20200629 13:44:36.922953 28508 demo_run_graph_main_gpu.cc:61] Initialize the GPU.
I20200629 13:44:36.934150 28508 gl_context_egl.cc:158] Successfully initialized EGL. Major : 1 Minor: 5
I20200629 13:44:36.961654 28514 gl_context.cc:324] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 440.33.01)
I20200629 13:44:36.961781 28508 demo_run_graph_main_gpu.cc:67] Initialize the camera or load the video.
I20200629 13:44:37.536909 28508 demo_run_graph_main_gpu.cc:88] Start running the calculator graph.
I20200629 13:44:37.540650 28508 demo_run_graph_main_gpu.cc:93] Start grabbing and processing frames.
INFO: Created TensorFlow Lite delegate for GPU.
I20200629 13:44:40.792667 28508 demo_run_graph_main_gpu.cc:160] Prepare video writer.
I20200629 13:44:45.500483 28508 demo_run_graph_main_gpu.cc:175] Shutting down.
I20200629 13:44:47.819614 28508 demo_run_graph_main_gpu.cc:189] Success!
Segmentation fault (core dumped)

Download the output file with following command using another window, which doesn't need disconnecting currend ssh connection. Change the file path of the key pair, the address of the connection destination instance, the output file path, and the download destination file path as appropriate.

scp -i "C:\path\to\key.pem" ubuntu@ec2-000-000-000-000.ap-northeast-1.compute.amazonaws.com:/home/ubuntu/mediapipe/video_gpu.mp4 .

Compare CPU and GPU processing speed

measurement

I used is a video in which two users and two or three hands are recorded. The resolution was 1920*1080[px], the time was 2[s], and the frame rate was 30[fps] (= 60 frames in all).

The command for CPU is as follows. I modified "gpu" to "cpu" in above command.

bazel build -c opt --copt -DMESA_EGL_NO_X11_HEADERS --copt -DEGL_NO_X11 \
  mediapipe/examples/desktop/multi_hand_tracking:multi_hand_tracking_cpuexport GLOG_logtostderr=1
bazel-bin/mediapipe/examples/desktop/multi_hand_tracking/multi_hand_tracking_cpu \
  --calculator_graph_config_file=mediapipe/graphs/hand_tracking/multi_hand_tracking_desktop_live.pbtxt \
  --input_video_path="video.mp4" \
  --output_video_path="video_cpu.mp4"

For the measurement, the time from the program execution to the end was visually measured.

result

The result of measuring the processing time was as follows. Execution with GPU was much faster. It seems that there was a (overhead) part other than the main detection process, so the speed was higher than this value.

  • CPU: 36 seconds
  • GPU: 5 seconds

Summary

MediaPipe can now be run on Amazon EC2 GPU instances. We need to change the program a little more, but we are ready to upload the image or video from the store to Cloud and run detection with MediaPipe.

Next, I'll check how to run MediaPipe on the edge.