Data Caching in Alteryx

2020.04.24

(日本語はこちらから)

Introduction

When using Alteryx, the data loading step may take long time if the amount of data to be loaded is too high. Waiting time can be annoying especially when a developer wants to build or modify the flow and several data quality checks are needed. By caching a data stream, one could save a lot of precious time and effectively build the workflow.

How to do it…

After opening a workflow, right click on the tool icon where you wish to cache the data. Select “Cache and Run Workflow”, and the data will be cached upto that point. Once cached, the runtime is effectively lowered for subsequent executions.

Cached icons will change but un-cached icons remain unchanged.

Cache can be removed so as to sync the data from original input.

Certain tools cannot be cached, as a result when some tools cannot be cached, it’s better to consider caching any other tool further downstream in the workflow. For ex:

(a) Tools with multiple output: If a tool has more than one output stream then it cannot be cached, such as Filter tool, Join tool etc

(b) Tools in loop with neighbouring tools: Those tools which are either engulfed or in loop with adjacent tools could not be cached. Better example is shown below

(c) Exception tools: Certain tools are excluded from being cached, those include some Predictive tools, R tool, Python tool etc.

Conclusion

It's better to use the caching functionality to reduce the execution time, especially when the data consumption is too high. Caching functionality wont work for all the tools, so if a tool can't be cached, choose another tool further below in the downstream line as shown here.