How to catch derived Vars with a clj-kondo hook

I love Clojure's REPL-driven development workflow.

The fast feedback loop I get when I evaluate the changed code and immediately see the result brings me joy every day.

But... I do have one gripe: derived Vars.

Let's take an example:

(defn welcome-page-handler []
  {:status 200
   :body "Welcome!"})

(def routes
  {"/" welcome-page-handler})

In the example above, we have a very simple web app skeleton with a handler function welcome-page-handler and a routes map from URLs to handler functions.

Let's try it out:

(let [handler-fn (get routes "/")]
  (handler-fn))

;; #=> {:status 200, :body "Welcome!"}

We simulated an HTTP request to the homepage path "/". We did "routing" with get and called the handler function for the "/" route, and the result was as expected. All good!

Now, let's make a small change to the welcome-page-handler function and re-evaluate it.

(defn welcome-page-handler []
  {:status 200
   :body "Welcome, stranger!"})

;; #=> #'user/welcome-page-hander

Let's test the handler again:

(let [handler-fn (get routes "/")]
  (handler-fn))

;; #=> {:status 200, :body "Welcome!"}

Dang! That's not what we wanted to see, right? We were expecting the new "Welcome, stranger!" greeting.

Why did this happen?

The reason is that the routes map still points to the old version of the welcome-page-handler function. To make the routes point to the new version, we must also re-evaluate the routes map too.

So what's the big deal? Just re-evaluate routes!

In a simple example like this, re-evaluating the routes after we change the welcome-page-handler (or any other function handler in the routes map) is a viable option. However, in a larger codebase, the handler function might be in different namespaces where the routes map is. It is cumbersome to re-evaluate routes every time you make a change to a handler. Changes are high that you will forget to do so.

Ok, so this is clearly a problem. Is there a solution?

Well, good news! Yes, there is a solution! Three solutions, actually:

  1. Use reloaded workflow
  2. Var quoting with the #' syntax
  3. Function wrapping

Let's go through these options.

Option 1: Use reloaded workflow

At Sharetribe, we've been developing our Clojure app for over nine years. Since the beginning, we've been using the reloaded workflow and the reloaded.repl library to solve the derived Var issue. It has served us well.

The idea in a nutshell is to have a (user/reset) function that utilizes tools.namespace to reload the namespaces that have changed and the namespaces that depend on the changed namespaces.

So if you have a routes namespace that requires handlers namespace and the handlers namespace changes, reloaded.repl will reload both namespaces. This solves the issue with derived Vars.

This used to work well for us. But after nine years of development and 100k+ LOC of clj/c/s code, things started to get slow.

In the Clojure Workflow Reloaded article, Stuart Sierra writes the following (emphasis mine):

Therefore, after every significant code change, I want to restart the application from scratch. But I don't want to restart the JVM and reload all my Clojure code in order to do it: that takes too long and is too disruptive to my workflow. Instead, I want to design my application in such a way that I can quickly shut it down, discard any transient state it might have built up, start it again, and return to a similar state. And when I say quickly, I mean that the whole process should take less than a second.

This is something that can't be achieved with a large code base. Let me demonstrate.

If I'm in Sharetribe's codebase and I make a change to one of the endpoint handler functions and call refresh in our user namespace, it usually takes 3-4 seconds:

user> (time
       (clojure.tools.namespace.repl/refresh))
:reloading (...)
"Elapsed time: 4680.926875 msecs"
;; => :ok

That's too long, in my opinion.

Options 2: Var quoting

Var quoting means that you use a special reader syntax #' so that the routes map would look like this:

(def routes
  ;; see the #' in front of the welcome-page-handler
  {"/" #'welcome-page-handler})

We can use the routes map precisely the same way we did earlier, with the exception that changes in the welcome-page-handler are reflected without re-evaluating the routes map:

(let [handler-fn (get routes "/")]
  (handler-fn))

;; #=> {:status 200, :body "Welcome, stranger!"}

What happens there is that the value of the routes map isn't a function anymore; it's a Var. Those two aren't equal unless the Var is dereferenced:

welcome-page-handler
;; #=> #function[user/welcome-page-handler]

#'welcome-page-handler
;; #=> #'user/welcome-page-handler

(= welcome-page-handler #'welcome-page-handler)
;; #=> false

(= welcome-page-handler @#'welcome-page-handler)
;; #=> true

Now, if the handler function isn't a function anymore, but a Var, how can we invoke the function without first deref'ing the Var? The reason is that when Var is used in the function position (the first element in the list), it is deref'd automatically. Var implements IFn interface.

Because of this, the two examples below behave the same:

(#'welcome-page-handler)
;; #=> {:status 200, :body "Welcome, stranger!"}

(@#'welcome-page-handler)
;; #=> {:status 200, :body "Welcome, stranger!"}

Option 3: Function wrapping

Function wrapping means that you wrap the handler function with another function, like this:

(def routes
  ;; see the anonymous function #( ) around the welcome-page-handler
  {"/" #(welcome-page-handler)})

Out of these two options, Var quoting and function wrapping, the #' approach is the one I'd prefer in most cases. It is, in my opinion, less noisy syntax.

However, #' should be avoided in CLJS (and thus also in CLJC).

How to find all derived Vars?

Ok, so, now we know what are our options to fix the derived Vars.

But there are still two open questions:

How can I find all the derived Vars in my codebase? Derived Vars are kinda subtle, aren't they?

We've been building our app without worrying about derived Vars for nine years because the reloaded workflow solved the issue. The result is that there are derived Vars here and there in the code base. We have many maps that do some kind of dispatching from a name or a keyword to an appropriate handler function. Derived Vars are sprinkled everywhere.

In addition to finding all the derived Vars, I want solve this issue once and for all. If I now go through our codebase and fix all the derived Vars, I don't want to do it again later. How can I prevent anyone from introducing derived Vars in the future? As I already mentioned, they are subtle and easily introduced by accident.

I once asked about this in the ClojureVerse discussion forum. The conclusion was that there isn't an existing solution out there to fix the issue with derived Vars once and for all, but clj-kondo might be able to get me somewhere.

clj-kondo has a nice feature, hooks, that allows enhancing linting via user-provided code.

And it turned out that it's possible to write a clj-kondo hook that will catch the derived Vars. Here's how.

The hook 🪝

Ok, let's get to it. Let's build the hook, piece by piece.

Let's start by editing the .clj-kondo/config.edn file. We need to define the hook and add the linter:

;; .clj-kondo/config.edn

;; ① Analyze both CLJ and CLJS `def`s with `hooks.def/analyze`.
;; This hook will emit `:fn-sym-in-def` findings.
{:hooks {:analyze-call {clojure.core/def hooks.def/analyze
                        cljs.core/def hooks.def/analyze}}

 :linters {
           ;; ② clj-kondo-config linter will complain about 
           ;; unknown linter `fn-sym-in-def`, unless we ignore it.
           #_{:clj-kondo/ignore [:clj-kondo-config]}
 
           ;; ③ Set `fn-sym-in-def` level to warning
           :fn-sym-in-def {:level :warning}}}

That's all we need in the config file for now.

In the config file, we're pointing to hooks.def namespace. Next, we'll create it.

Add a new CLJ file in .clj-kondo/hooks/def.clj, with the following content:

(ns hooks.def
  (:require [clj-kondo.hooks-api :as api]))
  
(defn analyze
  [{:keys [node lang]}]

  ;; ① Print the `node`, just to try things out.
  (println node)

  ;; ② Return the `node` without transforming it.
  node)

That's the skeleton for our hook. You can now try to run clj-kondo and see the defs in your project to be printed out:

✗ clj-kondo --lint src
<list: (def routes {"/" welcome-page-handler})>
linting took 28ms, errors: 0, warnings: 0

Since we are about to analyze the form def, it's good to keep in mind that def can have multiple arities:

;; no initial value
(def foo)

;; initial value
(def bar :bar-value)

;; docstring and initial value
(def quax "Quax docstring" :quax-value)

We're not interested in the first form; we can ignore it. Also, we are not interested in the docstring. We only want to inspect the last child of the node.

Let's add the logic to the analyze function to call an auxiliary function analyze*:

(defn analyze
  [{:keys [node lang]}]

  ;; ① Call `analyze*` with the last child node
  (when (< 1 (count (:children node)))
    (analyze* {:node (last (:children node))
               :lang lang}))

  ;; ② Return the `node` without transforming it.
  node)

Now, let's implement the analyze* function.

The analyze* will have a cond with two branches. Let's start with the first branch in which we call a fn-token-node? predicate, and if the node indeed is a token node pointing to a function, we record the finding and recommend a fix based on the language.

(defn- analyze* [{:keys [node lang]}]
  (cond
    
    ;; ① Call predicate function that returns true for symbols pointing to functions
    (fn-token-node? {:node node 
                     :lang lang})
                     
    ;; ② Record finding if predicate returns truthy
    (api/reg-finding! (assoc (meta node)
                             :type :fn-sym-in-def
                             :message (str "fn-sym-in-def: "
                                           node
                                           (if (= :cljs lang)
                                             " - use function wrapping"
                                             " - use var quoting"))))))

And now, let's implement the fn-token-node? predicate, which is the beef of the hook.

(defn- fn-token-node? [{:keys [node lang]}]
  ;; ① Check that the node is a symbol token node
  (when (and (api/token-node? node)
             (symbol? (:value node)))
             
    ;; ② Resolve the symbol to `ns` and `name`
    (let [{:keys [ns name]} (api/resolve {:name (:value node)})
          
          ;; ③ Get the _cached_ analysis data
          analysis (get-in (api/ns-analysis ns {:lang lang}) [lang name])]
          
      ;; ④ Ignore `clojure.core` and `cljs.core`
      (when-not (#{'clojure.core
                   'cljs.core} ns)
                   
        ;; ⑤ Return truthy if the analysis data looks like it's a function
        (some (set (keys analysis)) [:fixed-arities :varargs-min-arity])))))

First, we check if the node is a symbol token node.

After that, we use clj-kondo's api/resolve function to resolve the namespace and the name of the symbol.

Then, we use the resolved namespace and name to get the cached analysis data from clj-kondo.

Next, we filter out clojure.core and cljs.core namespaces.

If the returned analysis data looks like a function (i.e. it has either :fixed-arities or :varargs-min-arity key), we return truthy value from the predicate.

Now, we do want to do a bit of traversing in case the value is a Clojure collection. Only then we can catch the example case shown earlier we have a routing map from keys to handler functions. So let's add another predicate function and another branch to the cond and resursively traverse:

(defn- traverse? [{:keys [node]}]
  (or
   (api/vector-node? node)
   (api/map-node? node)
   (api/set-node? node)))

(defn- analyze* [{:keys [node lang]}]
  (cond
    (fn-token-node? {:node node 
                     :lang lang})
    (api/reg-finding! (assoc (meta node)
                             :type :fn-sym-in-def
                             :message (str "fn-sym-in-def: "
                                           node
                                           (if (= :cljs lang)
                                             " - use function wrapping"
                                             " - use var quoting"))))
                          
    ;; ① Check if we should recursively traverse
    (traverse? {:node node}) 
    
    ;; ② Traverse all child nodes.
    ;; Note the `doall` which is needed because `reg-finding!` is
    ;; a side-effecting function inside `map`
    (doall (map #(analyze* % lang) (:children node)))))

Alright, that's pretty much all we need! Now we can try it out to see if it catches the derived Vars in a router map.

➜  clj-kondo --lint src
src/core.clj:8:8: warning: fn-sym-in-def: welcome-page-handler - use var quoting
linting took 27ms, errors: 0, warnings: 1

That's exactly what we wanted to see! We got a warning saying that welcome-page-handler needs to be Var quoted.

To see the whole thing, have a look at the example repository at https://github.com/rap1ds/derived-vars-clj-kondo-hook

Caveats

There are some caveats in the hook:

First, the hook uses clj-kondo.hooks-api/ns-analysis, which utilizes clj-kondo cache. The problem is that if the cache is empty, you get false negative results, i.e. you don't get warnings of derived Vars.

When you're developing on your local environment, you likely have a populated clj-kondo cache. However, in a CI environment, you will probably start from a clean slate with an empty cache. The best workaround that I'm aware of is to run clj-kondo twice. The first time to populate the cache, and then the second time to do the actual linting. Obviously, this doubles the time it takes to lint the project. Luckily, clj-kondo is pretty fast.

The second caveat is that the traversing part is rather naive and only supports the most common Clojure data types. If you wrap the Var in e.g. when or partial etc, you won't get a linter warning. However, I feel that the current implementation is good enough and catches the most common cases.

Conclusion

Derived Vars are, in my opinion, one of the most biggest annoyance in the REPL-driven development workflow.

We can use Var quoting or function wrapping to make derived Vars REPL friendly in a large codebase where namespace reloading with tools.namespace is too slow.

It's quite easy to introduce derived Vars by accident, so a solution like linter check needs to be implemented to make sure we don't accidentally introduce them.

clj-kondo hooks are extremely powerful and can be used to analyze def forms, register findings, and report warnings for derived Vars.

With the help of the clj-kondo hook, I was able to identify and fix over 300 derived Vars from our codebase. The codebase is now a bit more friendly for REPL-driven development!

--

If you have any questions or comments, I'd love to hear! Please leave a comment on Mastodon.