The best way to do work faster is to not do work that doesn’t need doing. Gradle’s incremental build feature does exactly that by not running a task if its inputs and outputs are equivalent to what they were during its previous execution. Task output caching extends on this by re-using task outputs produced with the same inputs anytime before, on any machine that is connected to the same build cache backend.

This guide covers the different use cases for Gradle’s build cache, from local-only development to caching task outputs across large teams. Our intended audience is build engineers already familiar with the build cache who are looking to enable it for their existing builds. This guide will discuss ways to measure the advantages provided by the build cache, methods to improve cache performance, and how to diagnose and fix common problems. Let’s dive in!

1. Use cases for the build cache

Gradle’s build cache is a versatile feature that can be used in different ways.

1.1. Speed up developer builds with the local cache

Even when used by a single developer only, the build cache can be very useful. Gradle’s incremental build feature helps to avoid work that is already done, but once you re-execute a task, any previous results are forgotten. When you are switching branches back and forth, the local results get rebuilt over and over again, even if you are building something that has already been built before. The build cache remembers the earlier build results, and greatly reduces the need to rebuild things when they have already been built locally. This can also extend to rebuilding different commits, like when running git bisect.

The local cache can also be useful when working with a project that has multiple variants dimensions, as in the case of Android projects. Each variant has a number of tasks associated with it, and some of those task variants, despite having different names, can end up producing the same output. With the local cache enabled, reuse between task variants will happen automatically when applicable.

1.2. Share results between CI builds

The build cache can do more than go back-and-forth in time: it can also bridge physical distance between computers, allowing results generated on one machine to be re-used by another. A typical first step when introducing the build cache within a team is to enable it for builds running as part of continuous integration only. Using a shared HTTP build cache backend (such as the one provided by Gradle Enterprise) can significantly reduce the work CI agents need to do. This translates into faster feedback for developers, and less money spent on the CI resources. Faster builds also mean fewer commits being part of each build, which makes debugging issues more efficient.

Beginning with the build cache on CI is a good first step as the environment on CI agents is usually more stable and predictable than developer machines. This helps to identify any possible issues with the build that may affect cacheability.

If you are subject to audit requirements regarding the artifacts you ship to your customers you may need to disable the build cache for certain builds. Gradle Enterprise may help you with fulfilling these requirements while still using the build cache for all your builds. It allows you to easily find out which build produced an artifact coming from the build cache via build scans.

from cache origin

1.3. Accelerate developer builds by reusing CI results

When multiple developers work on the same project, they don’t just need to build their own changes: whenever they pull from version control, they end up having to build each other’s changes as well. Whenever a developer is working on something independent of the pulled changes, they can safely reuse outputs already generated on CI. Say, you’re working on module "A", and you pull in some changes to module "B" (which does not depend on your module). If those changes were already built in CI, you can download the task outputs for module "B" from the cache instead of generating them locally. A typical use case for this is when developers start their day, pull all changes from version control and then run their first build.

The changes don’t need to be completely independent, either; we’ll take a look at the strategies to reuse results when dependencies are involved in the section about the different forms of normalization.

1.4. Combine CI results with local caching on developer machines

Developers can utilize both a local and a remote cache. While pulling results from a CI-filled remote cache helps to avoid work needed because of changes by other developers, the local cache can speed up switching branches and doing git bisect.

1.5. Share results between developers

Allowing developers to upload their results to a shared cache is possible, but not recommended. Developers can make changes to task inputs or outputs while the task is executing. They can do this unintentionally and without noticing, for example by making changes in their IDEs while a build is running. Currently, Gradle has no good way to defend against these changes, and will simply cache whatever is in the output directory once the task is finished. This again can lead to corrupted results being uploaded to the shared cache. This recommendation might change when Gradle has added the necessary safeguards against unintentional modification of task inputs and outputs.

If you want to share task output from incremental builds, i.e. non-clean builds, you have to make sure that all cacheable tasks are properly configured and implemented to deal with stale output. There are for example annotation processors that do not clean up stale files in the corresponding classes/resources directories. The cache is a great forcing function to fix these problems, which will also make your incremental builds much more reliable. At the same time, until you have confidence that the incremental build behavior is flawless, only use clean builds to upload content to the cache.

2. Build cache performance

The sole reason to use any build cache is to make builds faster. But how much faster can you go when using the cache? Measuring the impact is both important and complicated, as cache performance is determined by many factors. Performing measurements of the cache’s impact can validate the extra effort (work, infrastructure) that is required to start using the cache. These measurements can later serve as baselines for future improvements, and to watch for signs of regressions.

Proper configuration and maintenance of a build can improve caching performance in a big way.

2.1. Fully cached builds

The most straightforward way to get a feel for what the cache can do for you is to measure the difference between a non-cached build and a fully cached build. This will give you the theoretical limit of how fast builds with the cache can get, if everything you’re trying to build has already been built. The easiest way to measure this is using the local cache:

  1. Clean the cache directory to avoid any hits from previous builds (rm -rf $GRADLE_HOME/caches/build-cache-*)

  2. Run the build (e.g. ./gradlew --build-cache clean assemble), so that all the results from cacheable tasks get stored in the cache.

  3. Run the build again (e.g. ./gradlew --build-cache clean assemble); depending on your build, you should see many of the tasks being retrieved from the cache.

  4. Compare the execution time for the two builds

You may encounter a few cached tasks even in the first of the two builds, where no previously cached results should be available. This can happen if you have tasks in your build that are configured to produce the same results from the same inputs; in such a case once one of these tasks has finished, Gradle will simply reuse its output for the rest of the tasks.

Normally, your fully cached build should be significantly faster than the clean build: this is the theoretical limit of how much time using the build cache can save on your particular build. You usually don’t get the achievable performance gains on the first try, see "Finding problems with task output caching". As your build logic is evolving and changing it is also important to make sure that the cache effectiveness is not regressing. Build scans provide a detailed performance breakdown which show you how effectively your build is using the build cache:

performance task execution

Fully cached builds occur in situations when developers check out the latest from version control and then build, for example to generate the latest sources they need in their IDE. The purpose of running most builds though is to process some new changes. The structure of the software being built (how many modules are there, how independent are its parts etc.), and the nature of the changes themselves ("big refactor in the core of the system" vs. "small change to a unit test" etc.) strongly influence the performance gains delivered by the build cache. As developers tend to submit different kinds of changes over time, caching performance is expected to vary with each change. As with any cache, the impact should therefore be measured over time.

In a setup where a team uses a shared cache backend, there are two locations worth measuring cache impact at: on CI and on developer machines.

2.2. Cache impact on CI builds

The best way to learn about the impact of caching on CI is to set up the same builds with the cache enabled and disabled, and compare the results over time. If you have a single Gradle build step that you want to enable caching for, it’s easy to compare the results using your CI system’s built-in statistical tools.

Measuring complex pipelines may require more work or external tools to collect and process measurements. It’s important to distinguish those parts of the pipeline that caching has no effect on, for example, the time builds spend waiting in the CI system’s queue, or time taken by checking out source code from version control.

When using Gradle Enterprise, you can use the export API to access the necessary data and run your analytics. Gradle Enterprise provides much richer data compared to what can be obtained from CI servers. For example, you can get insights into the execution of single tasks, how many tasks were retrieved from the cache, how long it took to download from the cache, the properties that were used to calculate the cache key and more. When using your CI servers built in functions, you can use statistic charts if you use Teamcity for your CI builds. Most of time you will end up extracting data from your CI server via the corresponding REST API (see Jenkins remote access API and Teamcity REST API).

Typically, CI builds above a certain size include parallel sections to utilize multiple agents. With parallel pipelines you can measure the wall-clock time it takes for a set of changes to go from having been pushed to version control to being built, verified and deployed. The build cache’s effect in this case can be measured in the reduction of the time developers have to wait for feedback from CI.

You can also measure the cumulative time your build agents spent building a changeset, which will give you a sense of the amount of work the CI infrastructure has to exert. The cache’s effect here is less money spent on CI resources, as you don’t need as many CI agents to maintain the same number of changes built.

If you want to look at the measurement for the Gradle build itself you can have a look at the blog post Introducing the build cache.

2.3. Measuring developer builds

Gradle’s build cache can be very useful in reducing CI infrastructure cost and feedback time, but it usually has the biggest impact when developers can reuse cached results in their local builds. This is also the hardest to quantify for a number of reasons:

  • developers run different builds

  • developers can have different hardware, or have different settings

  • developers run all kinds of other things on their machines that can slow them down

When using Gradle Enterprise you can use the export API to extract data about developer builds, too. You can then create statistics on how many tasks were cached per developer or build. You can even compare the times it took to execute the task vs pulling it from the cache and then estimate the time saved per developer.

When using the Gradle Enterprise build cache backend you should pay close attention to the hit rate in the admin UI. A rise in the hit rate there probably indicates better usage by developers:

build cache hit rate

3. Important concepts

How much of your build gets loaded from the cache depends on many factors. In this section you will see some of the tools that are essential for well-cached builds. Build scans are part of that toolchain and will be used throughout this guide.

3.1. Build cache key

Artifacts in the build cache are uniquely identified by a build cache key. A build cache key is assigned to each cacheable task when running with the build cache enabled and is used for both loading and storing task outputs to the build cache. The following inputs contribute to the build cache key for a task:

  • The task implementation

  • The task action implementations

  • The names of the output properties

  • The names and values of task inputs

Two tasks can reuse their outputs by using the build cache if their associated build cache keys are the same.

3.2. Repeatable task outputs

Assume that you have a code generator task as part of your build. When you have a fully up to date build and you clean and re-run the code generator task on the same code base it should generate exactly the same output, so anything that depends on that output will stay up-to-date.

It might also be that your code generator adds some extra information to its output that doesn’t depend on its declared inputs, like a timestamp. In such a case re-executing the task will result in different code being generated (because the timestamp will be updated). Tasks that depend on the code generator’s output will need to be re-executed.

When a task is cacheable, then the very nature of task output caching makes sure that the task will have the same outputs for a given set of inputs. Therefore, cacheable tasks should have repeatable task outputs. If they don’t, then the result of executing the task and loading the task from the cache may be different, which can lead to hard-to-diagnose cache misses.

The key here is that no cacheable task should have non-repeatable task outputs as an input.

3.3. Stable task inputs

Having a task repeatably produce the same output is not enough if its inputs keep changing all the time. Such unstable inputs can be supplied directly to the task. Consider a version number that includes a timestamp being added to the jar file’s manifest:

version = "3.2-${System.currentTimeMillis()}"

jar {
    manifest {
        attributes("Implementation-Version": version)
    }
}

In the above example the inputs for the jar task will be different for each build execution since this timestamp will continually change.

Another example for unstable inputs is the commit ID from version control. Maybe your version number is generated via git describe (and you include it in the jar manifest as shown above). Or maybe you include the commit hash directly in version.properties or a jar manifest attribute. Either way, the outputs produced by any tasks depending on such data will only be re-usable by builds running against the exact same commit.

Another common, but less obvious source of unstable inputs is when a task consumes the output of another task which produces non-repeatable results, such as the example before of a code generator that embeds timestamps in its output.

A task can only be loaded from the cache if it has stable task inputs. Unstable task inputs result in the task having a unique set of inputs for every build, which will always result in a cache miss.

3.4. Better reuse via input normalization

Having stable inputs is crucial for cacheable tasks. However, achieving byte for byte identical inputs for each task can be challenging. In some cases sanitizing the output of a task to remove unnecessary information can be a good approach, but this also means that a task’s output can only be normalized for a single purpose.

This is where input normalization comes into play. Input normalization is used by Gradle to determine if two task inputs are essentially the same. Gradle uses normalized inputs when doing up-to-date checks and when determining if a cached result can be re-used instead of executing the task. As input normalization is declared by the task consuming the data as input, different tasks can define different ways to normalize the same data.

When it comes to file inputs, Gradle can normalize the path of the files as well as their contents.

3.4.1. Path sensitivity and relocatability

When sharing cached results between computers, it’s rare that everyone runs the build from the exact same location on their computers. To allow cached results to be shared even when builds are executed from different root directories, Gradle needs to understand which inputs can be relocated and which cannot.

Tasks having files as inputs can declare the parts of a file’s path what are essential to them: this is called the path sensitivity of the input. Task properties declared with ABSOLUTE path sensitivity are considered non-relocatable. This is the default for properties not declaring path sensitivity, too.

For example, the class files produced by the Java compiler are dependent on the file names of the Java source files: renaming the source files with public classes would fail the build. However, moving the files around has no effect on the result of the compilation. Therefore, the path sensitivity for the sources of the JavaCompile task is NAME_ONLY. Because of this only the normalized paths of the Java source files — the file names — are considered as inputs to the JavaCompile task.

The Java compiler only respects the package declaration in the Java source files, not the relative path of the sources. As a consequence, path sensitivity for Java sources is NAME_ONLY and not RELATIVE.

3.4.2. Content normalization

Compile avoidance for Java

When it comes to the dependencies of a JavaCompile task (i.e. its compile classpath), only changes to the Application Binary Interface (ABI) of these dependencies require compilation to be executed. Gradle has a deep understanding of what a compile classpath is and uses a sophisticated normalization strategy for it. Task outputs can be re-used as long the the ABI of the classes on the compile classpath stays the same. This enables Gradle to avoid Java compilation by using incremental builds, or load results from the cache that were produced by different (but ABI-compatible) versions of dependencies. For more information on compile avoidance see the corresponding section in the user guide.

Runtime classpath normalization

Similar to compile avoidance, Gradle also understands the concept of a runtime classpath, and uses tailored input normalization to avoid running e.g. tests. For runtime classpaths Gradle inspects the contents of jar files and ignores the timestamps and order of the entries in the jar file. This means that a rebuilt jar file would be considered the same runtime classpath input. For details on what level of understanding Gradle has for detecting changes to classpaths and what is considered as a classpath see the user guide.

Filtering runtime classpaths

For a runtime classpath it is possible to provide better insights to Gradle which files are essential to the input by configuring input normalization.

Given that you want to add a file build-info.properties to all your produced jar files which contains volatile information about the build, e.g. the timestamp when the build started or some ID to identify the CI job that published the artifact. This file is only used for auditing purposes, and has no effect on the outcome of running tests. Nonetheless, this file is part of the runtime classpath for the test task. Since the file changes on every build invocation, tests cannot be cached effectively. To fix this you can ignore build-info.properties on any runtime classpath by adding the following configuration to build script:

normalization {
    runtimeClasspath {
        ignore "build-info.properties"
    }
}

The effect of this configuration would be that changes to build-info.properties would be ignored for both up-to-date checks and task output caching. All runtime classpath inputs for all tasks in the project where this configuration has been made will be affected. This will not change the runtime behavior of the test task — i.e. any test is still able to load build-info.properties, and the runtime classpath stays the same as before.

3.5. The case against overlapping outputs

When two tasks write to the same output directory or output file, it is difficult for Gradle to determine which output belongs to which task. There are many edge cases, and executing the tasks in parallel cannot be done safely. For the same reason, Gradle cannot remove stale output files for these tasks. Tasks that have discrete, non-overlapping outputs can always be handled in a safe fashion by Gradle. For the aforementioned reasons, task output caching is automatically disabled for tasks whose output directories overlap with another task.

Build scans show tasks where caching was disabled due to overlapping outputs in the timeline:

overlapping outputs timeline

3.6. Reuse of outputs between different tasks

Some builds exhibit a surprising characteristic: even when executed against an empty cache, they produce tasks loaded from cache. How is this possible? Rest assured that this is completely normal.

When considering task outputs, Gradle only cares about the inputs to the task: the task type itself, input files and parameters etc., but it doesn’t care about the task’s name or which project it can be found in. Running javac will produce the same output regardless of the name of the JavaCompile task that invoked it. If your build includes two tasks that share every input, the one executing later will be able to reuse the output produced by the first.

Having two tasks in the same build that do the same might sound like a problem to fix, but it is not necessarily something bad. For example, the Android plugin creates several tasks for each variant of the project; some of those tasks will potentially do the same thing. These tasks can safely reuse each other’s outputs.

3.7. Non-cacheable tasks

You’ve seen quite a bit about cacheable tasks, which implies there are non-cacheable ones, too. If caching task outputs is as awesome as it sounds, why not cache every task?

There are tasks that are definitely worth caching: tasks that do complex, repeatable processing and produce moderate amounts of output. Compilation tasks are usually ideal candidates for caching. At the other end of the spectrum lie I/O-heavy tasks, like Copy and Sync. Moving files around locally typically cannot be sped up by copying them from a cache. Caching those tasks would even waste good resources by storing all those redundant results in the cache.

Most tasks are either obviously worth caching, or obviously not. For those in-between a good rule of thumb is to see if downloading results would be significantly faster than producing them locally.

4. Caching Java projects

As of Gradle 4.0, the build tool fully supports caching plain Java projects. Built-in tasks for compiling, testing, documenting and checking the quality of Java code support the build cache out of the box.

4.1. Java compilation

Caching Java compilation makes use of Gradle’s deep understanding of compile classpaths. The mechanism avoids recompilation when dependencies change in a way that doesn’t affect their application binary interfaces (ABI). Since the cache key is only influenced by the ABI of dependencies (and not by their implementation details like private types and method bodies), task output caching can also reuse compiled classes if they were produced by the same sources and ABI-equivalent dependencies.

For example, take a project with two modules: an application depending on a library. Suppose the latest version is already built by CI and uploaded to the shared cache. If a developer now modifies a method’s body in the library, the library will need to be rebuilt on their computer. But they will be able to pull the compiled classes for the application from the shared cache. Gradle can do this because the library used to compile the application on CI, and the modified library available locally share the same ABI.

4.1.1. Annotation processors

Compile avoidance works out of the box. There is one caveat though: when using annotation processors, Gradle uses the annotation processor classpath as an input. Unlike most compile dependencies, in which only the ABI influences compilation, the implementation of annotation processors must be considered as an input to the compiler. For this reason Gradle will treat annotation processors as a runtime classpath, meaning less input normalization is taking place there. If Gradle detects an annotation processor on the compile classpath, the annotation processor classpath defaults to the compile classpath when not explicitly set, which in turn means the entire compile classpath is treated as a runtime classpath input.

For the example above this would mean the ABI extracted from the compile classpath would be unchanged, but the annotation processor classpath (because it’s not treated with compile avoidance) would be different. Ultimately, the developer would end up having to recompile the application.

The easiest way to avoid this performance penalty is to not use annotation processors. However, if you need to use them, make sure you set the annotation processor classpath explicitly to include only the libraries needed for annotation processing. The user guide describes how to do this.

Some common Java dependencies (such as Log4j 2.x) come bundled with annotation processors. If you use these dependencies, but to not leverage the features of the bundled annotation processors, it’s best to disable annotation processing entirely. This can be done by setting the annotation processor classpath to an empty set.

4.2. Unit test execution

The Test task used for test execution for JVM languages employs runtime classpath normalization for its classpath. This means that changes to order and timestamps in jars on the test classpath will not cause the task to be out-of-date or change the build cache key. For achieving "Stable task inputs" you also can wield the power of filtering the runtime classpath.

4.3. Integration test execution

Unit tests are easy to cache as they normally have no external dependencies. For integration tests the situation can be quite different, as they can depend on a variety of inputs outside of the test and production code. These external factors can be for example:

  • operating system type and version,

  • external tools being installed for the tests,

  • environment variables and Java system properties,

  • other services being up and running,

  • a distribution of the software under test.

You need to be careful to declare these additional inputs for your integration test in order to avoid incorrect cache hits. For example, declaring the operating system in use by Gradle as an input to a Test task called integTest would work as follows:

integTest {
    inputs.property("operatingSystem") {
        System.getProperty("os.name")
    }
}

4.3.1. Archives as inputs

It is common for the integration tests to depend on your packaged application. If this happens to be a zip or tar archive, then adding it as an input to the integration test task may lead to cache misses. This is because, as described in "Repeatable task outputs", rebuilding an archive often changes the metadata in the archive. You can depend on the exploded contents of the archive instead. See also the section on dealing with non-repeatable outputs.

4.3.2. Dealing with file paths

You will probably pass some information from the build environment to your integration test tasks by using system properties. When passing file locations, use only relative paths whenever possible, and add the input file as an input manually, so that Gradle can track changes to the contents of the file. The same should be applied to directories as well. Passing absolute paths will break relocatability of the integration test task. For example, with a task called integTest, instead of this:

integTest {
    systemProperty "distribution.location", file("build/dist").absolutePath
}

…​it is better to do this:

integTest {
    systemProperty "distribution.location", "build/dist" (1)
    inputs.dir("build/dist").withPropertyName("distDir").withPathSensitivity(PathSensitivity.RELATIVE) (2)
}
1 Adding the relative path as system property.
2 Adding all the files under the distribution directory as an input.

4.3.3. Ignoring system properties

It may be necessary to ignore some system properties as inputs as they do not influence the outcome of the integration tests. This is possible by overriding the systemProperties input property via the runtime API.

integTest {
    def ciProperties = [agentNumber: (System.getenv().get("AGENT_NUMBER") ?: 1)] (1)
    systemProperties(ciProperties) (2)

    inputs.property('systemProperties') { (3)
        systemProperties - ciProperties (4)
    }
}
1 This property does not influence the output of the integration tests.
2 Set the system properties for the actual test execution.
3 Overwrite the systemProperties input property.
4 The new value for the systemProperties input without the ignored property.

4.4. Caching buildSrc

The buildSrc project can be used to organize build logic in your Gradle build. While buildSrc typically does not change frequently, when it does every developer and CI agent needs to re-build and test it. Therefore, it is usually desirable to use task output caching for buildSrc, too. To use the same build cache configuration for buildSrc as for your root project, you can apply the same script both in buildSrc/settings.gradle and in settings.gradle as shown in the user guide.

5. Debugging and diagnosing cache misses

To make the most of task output caching, it is important that any necessary inputs to your tasks are specified correctly, while at the same time avoiding unneeded inputs. Failing to specify an input that affects the task’s outputs can result in incorrect builds, while needlessly specifying inputs that do not affect the task’s output can cause cache misses.

This chapter is about finding out why a cache miss happened. If you have a cache hit which you didn’t expect we suggest to declare whatever change you expected to trigger the cache miss as an input to the task.

5.1. Finding problems with task output caching

In order to find existing problems with your usage of task output caching you can do a variety of things.

First, you should start out with running a fully up-to-date build. You do not even have to enable the build cache for this exercise. The expected outcome is that that all incremental tasks are up-to-date. Tasks which have no output or no inputs will always be executed, but that shouldn’t be a problem. If you find some tasks which are out of date then use the same methods as described below for diagnosing and fixing cache misses to address this problems. The goal is to achieve "Stable task inputs" for cacheable tasks. If you find a task which is out of date but no cacheable tasks depend on its outcome then you don’t have to do anything about it.

When you are happy with the up-to-date performance then you can repeat the experiment above but this time with a clean build and the build cache turned on. The goal with clean builds and the build cache turned on is to retrieve all cacheable tasks from the cache. You should start out simple and use the same checkout for the consumer build. When you are happy with the performance in the single workspace use-case, then, depending on your use-case, you can go further:

  • Use different workspaces on the same machine.

  • Use different workspaces on different machines using a remote build cache.

  • Use the cache populated by a CI build and a developer machine.

After these experiments with fully cached builds you can go on and try to make typical changes to your project and see if enough tasks are still cached. If the results are not satisfactory you can think about restructuring your project to reduce dependencies between different tasks.

Consider recording execution times of your builds, generating graphs, and anlyzing the results. Keep an eye out for certain patterns, like a build recompiling everything even though you expected compilation to be cached.

You can also make changes to your code base manually or automatically and check that the expected set of tasks is cached.

If you have tasks that are re-executing instead of pulling their outputs from the cache, then it may point to a problem in your build. Techniques for debugging a cache miss are explained in the following section.

5.2. Helpful data for diagnosing a cache miss

A cache miss happens when Gradle calculates a build cache key for a task which is different from any existing build cache key in the cache. Only comparing the build cache key on its own does not give much information, so we need to look at some finer grained data to be able to diagnose the cache miss. A list of all inputs to the computed build cache key can be found in the user guide.

From most coarse grained to most fine grained, the items we will use to compare two tasks are:

  • Build cache keys

  • Task and Task action implementations

    • classloader hash

    • class name

  • Task output property names

  • Individual task property input hashes

  • Hashes of files which are part of task input properties

Currently, both the build cache key for the task and information down to the individual input property level is logged at the INFO level:

Appending taskClass to build cache key: org.gradle.api.tasks.compile.JavaCompile_Decorated
Appending classLoaderHash to build cache key: da6eca52100422099189290bf68f200a
Appending actionType to build cache key: org.gradle.api.internal.project.taskfactory.AbstractOutputPropertyAnnotationHandler$2$1
Appending actionClassLoaderHash to build cache key: 2cdf3f9202925b5befa161030ab43724
Appending actionType to build cache key: org.gradle.api.internal.project.taskfactory.TaskClassValidator
.
.
.
Appending inputPropertyHash for 'classpath' to build cache key: 2b6ab53aa11d4a7d4a1f95a8f78f4d7c
Appending inputPropertyHash for 'effectiveAnnotationProcessorPath' to build cache key: d41d8cd98f00b204e9800998ecf8427e
Appending inputPropertyHash for 'options.sourcepath' to build cache key: d41d8cd98f00b204e9800998ecf8427e
Appending inputPropertyHash for 'source' to build cache key: f6ba49b2466f0090272c43ac5f54ec1d
Appending outputPropertyName to build cache key: destinationDir
Build cache key for task ':compileJava' is 2b220117efa6710f7ab191a0bbe48c00

If you want to go into the details of finding out e.g. which classes and jar files constitute the classpath for the compileJava task you need to resort to comparing those files on disk yourself.

Luckily, you do not have to capture this data yourself - the build scan plugin already takes care of this for you. This means that Gradle Enterprise has the necessary data to diagnose the cache miss when comparing two build scans:

task inputs comparison

It is also possible that task output caching for a cacheable task was disabled. When this happens the reason why caching was disabled for the task is reported on the info log level and in the build scan:

caching disabled

5.3. Diagnosing the reasons for a cache miss

Having the data from the last section at hand, you should be able to diagnose why the outputs of a certain task were not found in the build cache. Since you were expecting more tasks to be cached, you should be able to pinpoint a build which would have produced the artifact under question.

Before diving into how to find out why one task has not been loaded from the cache we should first look into which task caused the cache misses. There is a cascade effect which causes dependent tasks to be executed if one of the tasks earlier in the build is not loaded from the cache and has different outputs. Therefore, you should locate the first cacheable task which was executed and continue investigating from there. This can be done from the timeline view in a build scan or from the task input comparison directly:

first non cached task

At first, you should check if the implementation of the task changed. This would mean checking the class names and classloader hashes for the task class itself and for each of its action. If there is a change, this means that the build script, buildSrc or the Gradle version has changed.

A change in the output of buildSrc also marks all the logic added by your build as changed. Especially, custom actions added to cacheable tasks will be marked as changed. This can be problematic, see "Review usages of `doFirst` and `doLast`".

If the implementation is the same, then you need to start comparing inputs between the two builds. There should be at least one different input hash. If it is a simple value property, then the configuration of the task changed. This can happen for example by

  • changing the build script,

  • conditionally configuring the task differently for CI or the developer builds,

  • depending on a system property or an environment variable for the task configuration,

  • or having an absolute path which is part of the input.

If the changed property is a file property, then the reasons can be the same as for the change of a value property. Most probably though a file on the filesystem changed in a way that Gradle detects a difference for this input. The most common case will be that the source code was changed by a check in. It is also possible that a file generated by a task changed, e.g. since it includes a timestamp. As described in "Java version tracking", the Java version can also influence the output of the Java compiler. If you did not expect the file to be an input to the task, then it is possible that you should alter the configuration of the task to not include it. For example, having your integration test configuration including all the unit test classes as a dependency has the effect that all integration tests are re-executed when a unit test changes. Another option is that the task tracks absolute paths instead of relative paths and the location of the project directory changed on disk.

5.4. Example

We will walk you through the process of diagnosing a cache miss. Let’s say we have build A and build B and we expected all the test tasks for a sub-project sub1 to be cached in build B since only a unit test for another sub-project sub2 changed. Instead, all the tests for the sub-project have been executed. Since we have the cascading effect when we have cache misses, we need to find the task which caused the caching chain to fail. This can easily be done by filtering for all cacheable tasks which have been executed and then select the first one. In our case, it turns out that the tests for the sub-project internal-testing were executed even though there was no code change to this project. We start the input property comparison in Gradle Enterprise and see that the property classpath changed. This means that some file on the runtime classpath actually did change. Looking deeper into this, we actually see that the inputs for the task processResources changed in that project, too. Finally, we find this in our build file:

task currentVersionInfo(type: CurrentVersionInfo) {
    version = project.version
    versionInfoFile = new File(generatedResourcesDir, "currentVersion.properties")
}

sourceSets.main.output.dir generatedResourcesDir, builtBy: currentVersionInfo

class CurrentVersionInfo extends DefaultTask {
    @Input
    String version

    @OutputFile
    File versionInfoFile

    @TaskAction
    void writeVersionInfo() {
        def properties = new Properties()
        properties.setProperty("latestMilestone", version)
        versionInfoFile.withOutputStream { out ->
            properties.store(out, null)
        }
    }
}

Since properties files stored by Java’s Properties.store method contain a timestamp, this will cause a change to the runtime classpath every time the build runs. In order to solve this problem see "Non-repeatable task outputs" or use input normalization.

The compile classpath is not affected since compile avoidance ignores non-class files on the classpath.

6. Solving common problems

Small problems in a build, like forgetting to declare a configuration file as an input to your task, can be easily overlooked. The configuration file might change infrequently, or only change when some other (correctly tracked) input changes as well. The worst that could happen is that your task doesn’t execute when it should. Developers can always re-run the build with clean, and "fix" their builds for the price of a slow rebuild. In the end nobody gets blocked in their work, and the incident is chalked up to "Gradle acting up again."

With cacheable tasks incorrect results are stored permanently, and can come back to haunt you later; re-running with clean won’t help in this situation either. When using a shared cache, these problems even cross machine boundaries. In the example above, Gradle might end up loading a result for your task that was produced with a different configuration. Resolving these problems with the build therefore becomes even more important when task output caching is enabled.

Other issues with the build won’t cause it to produce incorrect results, but will lead to unnecessary cache misses. In this chapter you will learn about some typical problems and ways to avoid them. Fixing these issues will have the added benefit that your build will stop "acting up," and developers can forget about running builds with clean altogether.

6.1. System file encoding

Most Java tools use the system file encoding when no specific encoding is specified. This means that running the same build on machines with different file encoding can yield different outputs. Currently Gradle only tracks on a per-task basis that no file encoding has been specified, but it does not track the system encoding of the JVM in use. This can cause incorrect builds. You should always set the file system encoding to avoid these kind of problems.

Build scripts are compiled with the file encoding of the Gradle daemon. By default, the daemon uses the system file encoding, too.

Setting the file encoding for the Gradle daemon mitigates both above problems by making sure that the encoding is the same across builds. You can do so in your gradle.properties:

gradle.properties
org.gradle.jvmargs=-Dfile.encoding=UTF-8

6.2. Environment variable tracking

Gradle does not track changes in environment variables for tasks. For example for Test tasks it is completely possible that the outcome depends on a few environment variables. To ensure that only the right artifacts are re-used between builds, you need to add environment variables as inputs to tasks depending on them.

Absolute paths are often passed as environment variables, too. You need to pay attention what you add as an input to the task in this case. You would need to ensure that the absolute path is the same between machines. Most times it makes sense to track the file or the contents of the directory the absolute path points to. If the absolute path represents a tool being used it probably makes sense to track the tool version as an input instead.

For example, if you are using tools in your Test task called integTest which depend on the the contents of the LANG variable you should do this:

integTest {
    inputs.property("langEnvironment") {
        System.getenv("LANG")
    }
}

If you add conditional logic to distinguish CI builds from local development builds, you have to ensure that this does not break the loading of task outputs from CI onto developer machines. For example, the following setup would break caching of Test tasks, since Gradle always detects the differences in custom task actions.

if (System.getenv().containsKey("CI")) {
    test.doFirst {
        println "Running test on CI"
    }
}

You should always add the action unconditionally:

test.doFirst {
    if (System.getenv().containsKey("CI")) {
        println "Running test on CI"
    }
}

This way, the task has the same custom action on CI and on developer builds and its outputs can be re-used if the remaining inputs are the same.

6.3. Line endings

If you are building on different operating systems be aware that some version control systems convert line endings on check-out. For example, Git on Windows uses autocrlf=true by default which converts all line endings to \r\n. As a consequence, compilation outputs can’t be re-used on Windows since the input sources are different. If sharing the build cache across multiple operating systems is important in your environment, then setting autocrlf=false across your build machines is crucial for optimal build cache usage.

When using symbolic links, Gradle does not store the link in the build cache but the actual file contents of the destination of the link. As a consequence you might have a hard time when trying to reuse outputs which heavily use symbolic links. There currently is no workaround for this behavior.

6.5. Java version tracking

Gradle tracks only the major version of Java as an input for compilation and test execution. Currently, it does not track the vendor nor the minor version. Still, the vendor and the minor version may influence the bytecode produced by compilation.

If you use different JVM vendors for compiling or running Java we strongly suggest that you add the vendor as an input to the corresponding tasks. This can be achieved by using the runtime API as shown in the following snippet.

tasks.withType(AbstractCompile) {
    inputs.property("java.vendor") {
        System.getProperty("java.vendor")
    }
}

tasks.withType(Test) {
    inputs.property("java.vendor") {
        System.getProperty("java.vendor")
    }
}

With respect to tracking the Java minor version there are different competing aspects: support pulling results for developers and having "perfect" results on CI. There are basically two situations when you may want to track the minor version of Java: for compilation and for runtime. In the case of compilation, there can sometimes be differences in the produced bytecode for different minor versions. However, the bytecode should still result in the same runtime behavior.

Java compile avoidance will treat this bytecode the same since it extracts the ABI.

Treating the minor number as an input can decrease the likelihood of a cache hit for developer builds. Depending on how standard development environments are across your team, it’s common for many different Java minor version to be in use.

Even without tracking the Java minor version you may have cache misses for developers due to some locally compiled class files which constitute an input to test execution. If these outputs made it into the local build cache on this developers machine even a clean will not solve the situation. Therefore, the choice for tracking the Java minor version is between sometimes or never re-using outputs between different Java minor versions for test execution.

The compiler infrastructure provided by the JVM used to run Gradle is also used by the Groovy compiler. Therefore, you can expect differences in the bytecode of compiled Groovy classes for the same reasons as above and the same suggestions apply.

6.6. Avoid changing inputs external to your build

If your build is dependent on external dependencies like binary artifacts or dynamic data from a web page you need to make sure that these inputs are consistent throughout your infrastructure. Any variations across machines will result in cache misses.

Never re-release a non-changing binary dependency with the same version number but different contents: if this happens with a plugin dependency, you will never be able to explain why you don’t see cache reuse between machines (it’s because they have different versions of that artifact).

Using SNAPSHOTs or other changing dependencies in your build by design violates the stable task inputs principle. To use the build cache effectively, you should depend on fixed dependencies. You may want to look into the dependency lock plugin or switch to using composite builds instead.

The same is true for depending on volatile external resources, for example a list of released versions. One way of locking the changes would be to check the volatile resource into source control whenever it changes so that the builds only depend on the state in source control and not on the volatile resource itself.

6.7. Suggestions for authoring your build

6.7.1. Review usages of doFirst and doLast

Using doFirst and doLast from a build script on a cacheable tasks ties you to build script changes since the implementation of the closure comes from the build script. If possible, you should use separate tasks instead.

Modifying input or output properties via the runtime API in doFirst is discouraged since these changes will not be detected for up-to-date checks and the build cache. Even worse, when the task does not execute, then the configuration of the task is actually different from when it executes. Instead of using doFirst for modifying the inputs consider using a separate tasks to configure the task under question - a so called configure task. E.g., instead of doing

jar {
    doFirst {
        jar.manifest.mainAttributes('Class-Path': "${project(':core').jar.archivePath.name} ${project(':baseServices').jar.archivePath.name}")
    }
}

do

task configureJar {
    doLast {
        jar.manifest.mainAttributes('Class-Path': "${project(':core').jar.archivePath.name} ${project(':baseServices').jar.archivePath.name}")
    }
}

jar.dependsOn(configureJar)

6.7.2. Build logic based on the outcome of a task

Do not base build logic on whether a task has been executed. In particular you should not assume that the output of a task can only change if it actually executed. Actually, loading the outputs from the build cache would also change them. Instead of relying on custom logic to deal with changes to input or output files you should leverage Gradle’s built-in support by declaring the correct inputs and outputs for your tasks and leave it to Gradle to decide if the task actions should be executed. For the very same reason using outputs.upToDateWhen is discouraged and should be replaced by properly declaring the task’s inputs.

6.7.3. Overlapping outputs

You already saw that overlapping outputs are a problem for task output caching in "The case against overlapping outputs". When you add new tasks to your build or re-configure built-in tasks make sure you do not create overlapping outputs for cacheable tasks. If you must you can add a Sync task which then would sync the merged outputs into the target directory while the original tasks remain cacheable.

Gradle Enterprise will show tasks where caching was disabled for overlapping outputs in the timeline and in the task input comparison:

overlapping outputs input comparison

6.8. Achieving stable task inputs

It is crucial to have stable task inputs for every cacheable task. In the following section you will learn about different situations which violate stable task inputs and look at possible solutions.

6.8.1. Volatile task inputs

If you use a volatile input like a timestamp as an input property for a task, then there is nothing Gradle can do to make the task cacheable. You should really think hard if the volatile data is really essential to the output or if it is only there for e.g. auditing purposes.

If the volatile input is essential to the output then you can try to make the task using the volatile input cheaper to execute. You can do this by splitting the task into two tasks - the first task doing the expensive work which is cacheable and the second task adding the volatile data to the output. In this way the output stays the same and the build cache can be used to avoid doing the expensive work. For example, for building a jar file the expensive part - Java compilation - is already a different task while the jar task itself, which is not cacheable, is cheap.

If it is not an essential part of the output, then you should not declare it as an input. As long as the volatile input does not influence the output then there is nothing else to do. Most times though, the input will be part of the output.

6.8.2. Non-repeatable task outputs

Having tasks which generate different outputs for the same inputs can pose a challenge for the effective use of task output caching as seen in "Repeatable task outputs". If the non-repeatable task output is not used by any other task then the effect is very limited. It basically means that pulling the task from the cache might produce a different result than executing the same task locally. If the only difference between the outputs is a timestamp, then you can either accept the effect of the build cache or decide that the task is not cacheable after all.

Non-repeatable task outputs lead to non-stable task inputs as soon as another task depends on the non-repeatable output. For example, re-creating a jar file from the files with the same contents but different modification times yields a different jar file. Any other task depending on this jar file as an input file cannot be loaded from the cache when the jar file is rebuilt locally. This can lead to hard-to-diagnose cache misses when the consuming build is not a clean build or when a cacheable task depends on the output of a non-cacheable task. For example, when doing incremental builds it is possible that the artifact on disk which is considered up-to-date and the artifact in the build cache are different even though they are essentially the same. A task depending on this task output would then not be able to load outputs from the build cache since the inputs are not exactly the same.

As described in "Stable task inputs" you can either make the task outputs repeatable or use input normalization. You already learned about the possibilities with configurable input normalization.

Gradle includes some support for creating repeatable output for archive tasks. For tar and zip files Gradle can be configured to create reproducible archives. This is done by configuring e.g. the Zip task via the following snippet.

task createZip(type: Zip) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
    // ...
}

Another way to make the outputs repeatable is to activate caching for a task with non-repeatable outputs. If you can make sure that the same build cache is used for all builds then the task will always have the same outputs for the same inputs by design of the build cache. Going down this road can lead to different problems with cache misses for incremental builds as described above. Moreover, race conditions between different builds trying to store the same outputs in the build cache in parallel can lead to hard-to-diagnose cache misses. If possible, you should avoid going down that route.

6.8.3. Limit the effect of volatile data

If none of the described solutions for dealing with volatile data work for you, you should still be able to limit the effect of volatile data on effective use of the build cache. This can be done by adding the volatile data later to the outputs as described in "Volatile task inputs". Another option would be to move the volatile data so it affects less tasks. For example moving the dependency from the compile to the runtime configuration may already have quite an impact.

Sometimes it is also possible to build two artifacts, one containing the volatile data and another one containing a constant representation of the volatile data. The non-volatile output would be used e.g. for testing while the volatile one would be published to an external repository. While this conflicts with the Continuous Delivery "build artifacts once" principle it can sometimes be the only option.

6.9. Custom and third party tasks

If your build contains custom or third party tasks, you should take special care that these don’t influence the effectiveness of the build cache. Special care should also be taken for code generation tasks which may not have repeatable task outputs. This can happen if the code generator includes e.g. a timestamp in the generated files or depends on the order of the input files. Other pitfalls can be the use of `HashMap`s or other data structures without order guarantees in the task’s code.

Some third party plugins can even influence cacheability of Gradle’s built-in tasks. This can happen if they add inputs like absolute paths or volatile data to tasks via the runtime API. In the worst case this can lead to incorrect builds when the plugins try to depend on the outcome of a task and do not take FROM-CACHE into account.

7. Summary

This guide has described the two ways Gradle can avoid work and the concepts which make up a build that effectively leverages task output caching. You should now have the knowledge to adapt your own build so it can make effective use of the build cache. By doing so, you will improve incremental build performance while having fast clean builds when using the build cache. Remember, the best way to do work faster is to simply avoid work that doesn’t need doing.

Be aware that your journey does not end here. You should invest into keeping your build well behaved and check manually or automatically that you are still making effective use of the build cache.

The Gradle team’s journey does not end here either. They will continue improving the build cache and give you more tools to make your builds even faster.

8. Help improve this guide

Have feedback or a question? Found a typo? Like all Gradle guides, help is just a GitHub issue away. Please add an issue or pull request to gradle-guides/using-build-cache and we’ll get back to you.