How we resolve the slowness of integration tests in Serverless with the parallelism of CircleCI

In this post, I introduce how we resolve the slowness of integration tests with the feature of CircleCI.
2018.10.18

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Introduction

Integration tests have become more and more important in Serverless. The proposition already has been discussed in some places, so you might hear that. Briefly speaking, Serverless applications consist of many fully-managed services which cloud vendors (like AWS) provide so that it is hard or impossible to simulate the services in local environments. Certainly, LocalStack, SAM CLI and similar tools have been tackling the problem, but, at least for now, these tools do not have power enough to replace the service in actual environments because they lack some services and functionalities.

Hence, Integration tests are executed against actual environments in most cases because we have no choice but to do so. This is a rational strategy for the tests so far. But it also introduces another problems. One of the problems is the slowness of test feedback. Integration tests are much slower than unit tests due to its network access to test targets. We had difficulty in handling the problem. We should write tests to improve the quality of our software but it slows the execution time.

For making the problem clearer, I explain how we do integration tests in our project. Please look at the picture below:

This is the architecture we develop. We create an API mainly composed of Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. Actually, we use other AWS services for our project but I do not touch on those because they are not related on this post.

Anyway, the client is a mobile app. The app can communicate with backend systems through the API. The problem is the integration tests of the API. What the tests do are to access the API with HTTP Client and evaluate the result. The tests take about 13 minutes to complete. It is very slow.

In this post, I introduce how we resolve the problem with the feature of CircleCI.

Parallelism in CircleCI

To reduce the execution time, we decided to use the parallelism feature of CircleCI. This is the most concise and convenient way to execute tests in parallel I have ever known. We have been using pytest in all tests, so we can also leverage its plugins (e.g,. pytest-xdist, pytest-parallel and so on). But one of the advantages of CircleCI parallelism is its independence from any programming languages. We can use it with any languages. Considering the choice of another language, we decided to use it.

Anyway, lets get back on track. To make use of it, it is necessary to follow 2 steps. Let's take a closer look at these steps.

First, you need to set up parallelism section in your job. This means how many containers you execute in parallel. You need to notice that the section has to be placed in job level. CircleCI does not support step level for parallelism currently. Therefore, you need to consider the granularity of job. For example, if there are many steps which do not need to run in parallel, this will decrease its benefit. If you want to get its benefit, you need to separate these steps from the job.

It is very easy to use. All you have to do is to specify parallelism section in your job. For example, something like this:

version: 2
jobs:
  test:
    docker:
      - image: circleci/<language>:<version TAG>
    parallelism: 4

Next, CircleCI CLI. The tool is mainly used for debugging and validating CircleCI's config file ( config.yml ) on local environments. This tool also has tests command and the command has 2 subcommands, glob and split . The glob subcommand is used to discover test files and split subcommand is used to distribute test files among containers.

I will try to explain in more details.

The glob subcommand looks like test discovery which testing frameworks usually provide. It can handle wildcards to filter test files. You may use this for example:

$ circleci tests glob "tests/integration/test_*.py"
tests/integration/test_foo.py
tests/integration/test_bar.py
tests/integration/test_baz.py
tests/integration/test_blah.py
...

The split subcommand determines how test files, which passed as an argument, are distributed among containers. A usual pattern is to pass the result of glob subcommand into split subcommand's stdin. Look at the example:

$ circleci tests glob "tests/integration/test_*.py" | circleci tests split

In this chapter, I introduced how to use parallelism in CircleCI. Next, I explain how we utilize the feature in our project to reduce the execution time.

Parallelism in Integration Tests

As mentioned above, we use pytest for tests. We give its test files a prefix named test and place them in tests/integration directory. Therefore, you can create a CircleCI config file like the following:

  build_and_test_unit:
    parallelism: 2
    <<: *build_and_test_container
    steps:
...
      - run:
          name: Run integration tests
          command: |
            source aws-envs.sh
            source .venv/bin/activate
            python -m pytest $(circleci tests glob "tests/integration/test_*.py" | circleci tests split)

We just pass test files to pytest as an argument and distribute these files among containers. Let me show you how the config file handles the job. Let's look at the picture below:

You may notice the number at the end of each steps. This is the index of containers. In this case, I specify two parallelism, therefore two containers invoke. Also see the execution time. We reduce the time to 5 minutes though it took more time before.

And lastly, please look at the picture below. You notice that the same step does different tasks. That is, split subcommand distribute tasks among containers.

Conclusion

In this post, the feature and a real world example of parallelism in CircleCI are discussed. I think it can be adapted in many situations. I hope you find this post helpful.