grammarly.perseverance

https://github.com/grammarly/perseverance.git

git clone 'https://github.com/grammarly/perseverance.git'

(ql:quickload :grammarly.perseverance)
132

Perseverance is a flexible retried operations library inspired by the Common Lisp's condition system. It decouples the logic of marking something as retriable from the decision on how it should be retried.

** Why?

There already exist Clojure libraries for retrying operations: [[https://github.com/liwp/again][again]], [[https://github.com/joegallo/robert-bruce][robert-bruce]]. These libraries require you to specify the retry strategy in the same place where something needs to be retried. Such urgency results in high-level decisions being made inside low-level code.

Should a function downloading a file automatically retry on failure? If it doesn't, the higher-level code cannot choose to retry this particular failed case. If it does retry automatically, how many attempts should it make? What if the higher-level code needs the function to fail fast and then try something else instead?

In many cases, the lower-level code has enough knowledge of how to do something (i.e., retry an operation) but doesn't know what to do (retry or not, with which delay). The high-level code knows what it wants, but doesn't know how to achieve it. Perseverance bridges these levels by separating the how's from what's.

The following code demonstrates a scenario where unreliable functions are reinforced with Perseverance:

#+BEGIN_SRC clojure (require '[perseverance.core :as p])

;; Fake function that returns a list of files but fails the first three times. (let [cnt (atom 0)] (defn list-s3-files [] (when (< @cnt 3) (swap! cnt inc) (throw (RuntimeException. “Failed to connect to S3.”))) (range 10)))

;; Fake function that imitates downloading a file with 50/50 probability. (defn download-one-file [x] (if (> (rand) 0.5) (println (format “File #%d downloaded.” x)) (throw (java.io.IOException. “Failed to download a file.”))))

;; Let's wrap the previous function in retriable. (defn download-one-file-safe [x] (p/retriable {} (download-one-file x)))

;; Now to a function that downloads all files. (defn download-all-files [] (let [files (p/retriable {:catch [RuntimeException] :tag ::list-files} (list-s3-files))] (mapv download-one-file-safe files)))

;; Let's call it and see what happens. (download-all-files) ;; Unhandled java.lang.RuntimeException: Failed to connect to S3.

;; Bam! The exception still happened. It's because we haven't established ;; the retry context.

(p/retry {} (download-all-files))

;; java.lang.RuntimeException: Failed to connect to S3., retrying in 0.5 seconds… ;; java.lang.RuntimeException: Failed to connect to S3., retrying in 0.5 seconds… ;; java.lang.RuntimeException: Failed to connect to S3., retrying in 0.5 seconds… ;; java.io.IOException: Failed to download a file., retrying in 0.5 seconds… ;; File #0 downloaded. ;; File #1 downloaded. ;; java.io.IOException: Failed to download a file., retrying in 0.5 seconds… ;; File #2 downloaded. ;; File #3 downloaded. ;; java.io.IOException: Failed to download a file., retrying in 0.5 seconds… ;; java.io.IOException: Failed to download a file., retrying in 0.5 seconds… ;; File #4 downloaded. ;; …

;; The call eventually succeeds! #+END_SRC

** Usage

Add this line to the list of your dependencies:

[[https://clojars.org/com.grammarly/perseverance][https://clojars.org/com.grammarly/perseverance/latest-version.svg]]

=perseverance.core= is the only namespace. It exposes two main macros: =retriable= and =retry=. The former is used to mark a piece of code as suitable for retrying. The arguments are =[options-map & body]=.

#+BEGIN_SRC clojure (retriable {:catch [RuntimeException] :tag ::list-files ;; :ex-wrapper #(ex-info “My wrapped exception”. {:e %, :foo :bar}) } (list-s3-files)) #+END_SRC

Options map supports the following keys (all of them are optional):

The =retry= macro has the same signature: =[options-map & body]=.

#+BEGIN_SRC clojure (retry {:strategy (constant-retry-strategy 100) :selector ::list-files ;; :selector #(and (instance? ExceptionInfo %) ;; (= (:foo (ex-data %)) :bar)) } (download-all-files)) #+END_SRC

The options-map can have the following keys:

With the help of selectors, you can nest =retry= blocks to specify different retry strategies for different retriable cases:

#+BEGIN_SRC clojure (retry {:strategy (constant-retry-strategy 500)} ;; Catches everything. (retry {:strategy (progressive-retry-strategy :initial-delay 2000, :max-delay 10000) :selector ::list-files} (download-all-files))) #+END_SRC

* Strategies

Perseverance ships with two strategies (or, more specifically, strategy
constructors):

=constant-retry-strategy= takes a delay and returns the same delay on each
attempt. If =max-count= is provided, the strategy starts returning =nil= after
the number of attempts reaches =max-count=. Perseverance treats =nil= from a
strategy as a signal to stop retrying the operation.

=progressive-retry-strategy= is a fancy variation of exponential backoff
algorithm. It starts with =initial-delay= and returns it =stable-length=
times. Then for each next attempt, the delay is multiplied by =multiplier=
but cannot reach more than =max-delay=. After =max-count= attempts (if
provided), the strategy starts returning =nil=. For example, for this
strategy:

#+BEGIN_SRC clojure

(progressive-retry-strategy :initial-delay 1000, :stable-length 4, :multiplier 2, :max-delay 10000) #+END_SRC

the delays will be:

: 1000, 1000, 1000, 1000, 2000, 4000, 8000, 10000, 10000, 10000...

You can write custom strategies too. A strategy is a function that takes the
attempt number and returns a delay in milliseconds (or =nil= if retry
shouldn't be made). Attempts start from =1=, not zero.

** Drawbacks and considerations

Like any stack-based error-handling mechanism, Perseverance is susceptible to mistakes when used with multi-threaded, asynchronous, or lazily evaluated code. Perseverance is implemented on top of try/catch and Clojure's dynamic variables; so, you should be especially careful that the code inside =retriable= and =retry= doesn't escape the dynamic scope. Lately, some of the concurrency primitives (i.e., =future= and core.async's =go= blocks) started forwarding the dynamic bindings into their threads, but laziness still causes problems.

Taking away all the strategies and dynamic fanciness, Perseverance is just a dumb retrier. This is OK for requests that don't impact the other side of the communication, or if the actions are idempotent. But if you are making a call that must succeed only once, or you have to be sure that the retries don't make the outage in the system even worse, you might want to use a more sophisticated fault tolerance mechanism.

** License

© Copyright 2016 Grammarly, Inc.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.