gorillalabs.sparkling

https://github.com/gorillalabs/sparkling.git

git clone 'https://github.com/gorillalabs/sparkling.git'

(ql:quickload :gorillalabs.sparkling)
377

Sparkling - A Clojure API for Apache Spark

Sparkling is a Clojure API for Apache Spark.

Show me a small sample

(do
  (require '[sparkling.conf :as conf])
  (require '[sparkling.core :as spark])
  (spark/with-context sc (-> (conf/spark-conf)              ; this creates a spark context from the given context
                             (conf/app-name "sparkling-test")
                             (conf/master "local"))
                      (let [lines-rdd (spark/into-rdd sc ["This is the first line"   ;; here we provide data from a clojure collection.
                                                          "Testing spark"           ;; You could also read from a text file, or avro file.
                                                          "and sparkling"           ;; You could even approach a JDBC datasource
                                                          "Happy hacking!"])]
                        (spark/collect                      ;; get every element from the filtered RDD
                          (spark/filter                     ;; filter elements in the given RDD (lines-rdd)
                            #(.contains % "spark")          ;; a pure clojure function as filter predicate
                            lines-rdd)))))

Where to find more info

Check out our site for information about Gorillalabs Sparkling and a getting started guide.

Sample Project repo available

Just clone our getting-started repo and get going right now.

But note: There's one thing you need to be aware of: Certain namespaces need to be AOT-compiled, e.g. because the classes are referenced in the startup process by name. I'm doing this in my project.clj using the :aotdirective like this

            :aot [#".*" sparkling.serialization sparkling.destructuring]

Availabilty from Clojars

Sparkling is available from Clojars. To use with Leiningen, add

Clojars Project to your dependencies.

Build Status

Release Notes

2.0.0 - switch to Spark 2.0

1.2.3 - more developer friendly

1.2.2 - added whole-text-files in sparkling.core.

(thanks to Jase Bell)

1.2.1 - improved Kryo Registration, AVRO reader + new Accumulator feature

1.1.1 - cleaned dependencies

1.1.0 - Added a more clojuresque API

1.0.0 - Added value to the existing libraries (clj-spark and flambo)

Contributing

Feel free to fork the Sparkling repository, improve stuff and open up a pull request against our “develop” branch. However, we'll only add features with tests, so make sure everything is green ;)

Acknowledgements

Thanks to The Climate Corporation and their open source clj-spark project, and to Yieldbot for yieldbot/flambo which served as the starting point for this project.

License

Copyright (C) 2014-2015 Dr. Christian Betz, and the Gorillalabs team.

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.