miner.herbert

https://github.com/miner/herbert.git

git clone 'https://github.com/miner/herbert.git'

(ql:quickload :miner.herbert)
247

We turn our backs on confusion and seek the beginning. – Sevrin

Note: Clojure 1.9 will introduce a new core library, known as clojure.spec, which makes Herbert obsolete.

Herbert

A schema language for edn (Clojure data).

Way to Eden

The extensible data notation (edn) defines a useful subset of Clojure data types. As described on edn-format.org:

edn is a system for the conveyance of values. It is not a type system, and has no schemas.

The explicit lack of schemas in edn stands in marked contrast to many serialization libraries which use an interface definition language. The edn values essentially speak for themselves, without the need for a separate description or layer of interpretation. That is not to say that schemas aren't potentially useful, they're just not part of the definition of the edn format.

The goal of the Herbert project is to provide a convenient schema language for defining edn data structures that can be used for documentation and validation. The schema patterns are represented as edn values.

Leiningen

Herbert is available from Clojars. Add the following dependency to your project.clj:

Herbert on clojars.org

Usage

The main namespace is miner.herbert. The conforms? predicate takes a schema pattern and a value to test. It returns true if the value conforms to the schema pattern, false otherwise.

The conform function is used to build a test function. Given a schema, it returns a function of one argument that will execute a match against the schema pattern and return a map of bindings if successful or nil for a failed match. If you need to know how the schema bindings matched a value or you want to test against a schema multiple times, you should use conform to define a test function.

Quick example:

(require '[miner.herbert :as h])
(h/conforms? '{:a int :b [sym+] :c str} '{:a 42 :b [foo bar baz] :c "foo"})
;=> true

;; For better performance, create a test function with `h/conform`.
(def my-test (h/conform '{:a (:= A int) :b [sym+] :c str}))
(my-test '{:a 42 :b [foo bar baz] :c "foo"})
;=> {A 42}

Test.Check integration

The property function takes a predicate and a schema as arguments and returns a test.check property suitable for generative testing. (test.check also has a defspec macro for use with clojure.test.) If you just want the generator for a schema, call generator. The sample function is similar to test.check version but takes a schema.

(require '[miner.herbert.generators :as hg])
(require '[clojure.test.check :as tc])

;; trivial example
(tc/quick-check 100 (hg/property integer? 'int))

;; confirm the types of the values
(tc/quick-check 100 (hg/property (fn [m] (and (integer? (:int m)) (string? (:str m)))) 
                                 '{:int int :str str :kw kw}))

;; only care about the 42 in the right place
(tc/quick-check 100 (hg/property (fn [m] (== (get-in m [:v 2 :int]) 42))
                                '{:v (vec kw kw {:int 42} kw) :str str}))

;; samples from a schema generator
(clojure.test.check.generators/sample (hg/generator '[int*]))
;=> (() (9223372036854775807) [9223372036854775807] () [] (1 1) () [-7] (4) [-5])

;; generate samples directly from a schema (notice the "hg" namespace)
(hg/sample '[int*])
;=> (() [-1 0] () (9223372036854775807) [9223372036854775807] [] () () [0 -5] [9223372036854775807] (7 9223372036854775807) [] (12 -11) (-9223372036854775808) [-12 9223372036854775807] [-10 9223372036854775807] (-11) [2] (-11) [-7])

Notation for Schema Patterns

Regular Expression Support

For conformance testing (as with conforms?), Herbert allows several terms to be parameterized by regular expression (see str, sym, etc). Both the Clojure syntax for regular expressions and the Java String format are allowed (see clojure.core/re-pattern and java.util.regex.Pattern.) Note that Clojure regular expressions (like #"foo+bar*") are not edn types, so you should use Strings if you want your Herbert schemas to be completely edn-compatible. The main difference is that Java String notation requires you to use double backslashes to get the effect of a single backslash in your regex. For example, Clojure #"foo\d" would be written as the String "foo\\d".

Although the full Java regular expression syntax is supported for conformance testing, the Herbert generator implementation supports only a limited form of regular expressions. (Someday, test.check may support a regex generator, but for now, Herbert has to implement string generation as best it can.) The built-in string generator supports basic regular expressions with ASCII characters, such as "[a-z] [^abc] a.b* c+d(ef|gh)? \d\D\w\W\s\S". It does not support advanced regex features such as unicode notation, minimum and maximum match counts, intersection character classes, POSIX character classes, case-insensitity flags, look-ahead, look-back, back-references, greedy, reluctant or possessive quantification, etc.

The dynamic Var *string-from-regex-generator* allows the user to customize the test.check string generator used internally by Herbert. When *string-from-regex-generator* is bound to a test.check generator, Herbert will use this generator for any term that is parameterized by a regular expression. The generator should take one argument, which can be either a java.util.regex.Pattern or a String, as the regex. It should generate strings that match the given regex. When it's bound to nil (the default), Herbert will use its internal string generator as described above.

If you need better support for Java regular expressions when generating Strings, you should consider using the test.chuck library which provides the string-from-regex generator. You can use it with Herbert like this:

(require '[com.gfredericks.test.chuck.generators :as chuck])
(require '[miner.herbert :as h])
(require '[miner.herbert.generators :as hg])

(binding [h/*string-from-regex-generator* chuck/string-from-regex]
    (hg/sample '(str #"\x66oo\dbar{1,3}") 5))

;=> ("foo5bar" "foo2barr" "foo5barrr" "foo2barrr" "foo9bar")

Experimental Features

These features are implemented as an experiment, but I'm not sure I'll keep them as they're a bit of a hack:

Examples

(require '[miner.herbert :as h])

(h/conforms? 'int 10)
;=> true

(h/conforms? '(grammar int) 10)
; a very simple "grammar" with no rules, equivalent to the start pattern
;=> true

(h/conforms? '(grammar {show numbers}, show str, numbers [int+]) '{"Lost" [4 8 15 16 23 42]})
; target pattern can use named subpatterns defined by tail of name/pattern pairs
;=> true

(h/conforms? '(:= A (or :a [:b A])) [:b [:b [:b :a]]])
; matches a recursive binding of `A`
;=> true

(h/conforms? '{:a int :b sym :c? [str*]} '{:a 1 :b foo :c ["foo" "bar" "baz"]})
;=> true

(h/conforms? '{:a int :b sym :c? [str*]} '{:a 1 :b foo})
; :c is optional so it's OK if it's not there at all.
;=> true

(h/conforms? '{:a int :b sym :c? [str*]} '{:a foo :b bar})
;=> false

(h/conforms? '{:a (:= A int) :b sym :c? [A+]} '{:a 1 :b foo :c [1 1 1]})
; `A` is bound to the int associated with :a, and then used again to define the values
; in the seq associated with :c.  
;=> true

(h/conforms? '(& {:a (:= A int) :b (:= B sym) :c (:= C [B+])} (when (= (count C) A))) 
           '{:a 2 :b foo :c [foo foo]})

; The & operator just means the following elements are found inline,
; not in a collection.  In this case, we use it to associate the
; when-test with the single map constraint.  The assertion says that
; number of elements in the :c value must be equal to the value
; associated with :a.  Notice that all the elements in the :c seq
; must be equal to the symbol associated with :b.
=> true

((h/conform '[(:= A int) (:= B int) (:= C int+ A B)]) [3 7 4 5 6])
; Inside a seq, the first two ints establish the low and high range of the rest 
; of the int values.
;=> {C [4 5 6], B 7, A 3}

(def my-checker (h/conform '[(:= MAX int) (:= XS int+ MAX)]))
(my-checker [7 3 5 6 4])
;=> {XS [3 5 6 4], MAX 7}

(defn palindrome? [s]
    (and (string? s)
        (= s (clojure.string/reverse s))))
        
(h/conforms? '(grammar [pal+]
                  palindrome user/palindrome?
                  pal {:len (:= LEN int) :palindrome (and palindrome (cnt LEN))})
             [{:palindrome "civic" :len 5}
              {:palindrome "kayak" :len 5} 
              {:palindrome "level" :len 5}
              {:palindrome "ere" :len 3}
              {:palindrome "racecar" :len 7}])
;=> true

Templates

If you want to mix external data into a Herbert pattern, I suggest that you use the backtick library's template function.

References

Related Projects

If Herbert isn't exactly what you're looking for, here are some other projects that take different approaches to similar problems:

Herbert is obsolete as of Clojure 1.9

Star Trek: The Way to Eden

stardate 5832.3

Space Hippies: “Herbert, Herbert, Herbert …”
Spock: “Herbert was a minor official notorious for his rigid and limited patterns of thought.”
Kirk: “Well, I shall try to be less rigid in my thinking.”

video clip: http://www.youtube.com/watch?v=PQONBf9xMss

Way to Eden

Copyright and License

Copyright (c) 2013 Stephen E. Miner.

Distributed under the Eclipse Public License, the same as Clojure.