https://github.com/gorillalabs/sparkling.git
git clone 'https://github.com/gorillalabs/sparkling.git'
(ql:quickload :gorillalabs.sparkling)
Sparkling is a Clojure API for Apache Spark.
(do
(require '[sparkling.conf :as conf])
(require '[sparkling.core :as spark])
(spark/with-context sc (-> (conf/spark-conf) ; this creates a spark context from the given context
(conf/app-name "sparkling-test")
(conf/master "local"))
(let [lines-rdd (spark/into-rdd sc ["This is the first line" ;; here we provide data from a clojure collection.
"Testing spark" ;; You could also read from a text file, or avro file.
"and sparkling" ;; You could even approach a JDBC datasource
"Happy hacking!"])]
(spark/collect ;; get every element from the filtered RDD
(spark/filter ;; filter elements in the given RDD (lines-rdd)
#(.contains % "spark") ;; a pure clojure function as filter predicate
lines-rdd)))))
Check out our site for information about Gorillalabs Sparkling and a getting started guide.
Just clone our getting-started repo and get going right now.
But note: There's one thing you need to be aware of: Certain namespaces need to be AOT-compiled, e.g. because the classes are referenced in the startup process by name. I'm doing this in my project.clj using the :aot
directive like this
:aot [#".*" sparkling.serialization sparkling.destructuring]
Sparkling is available from Clojars. To use with Leiningen, add
whole-text-files
in sparkling.core.(thanks to Jase Bell)
Feel free to fork the Sparkling repository, improve stuff and open up a pull request against our “develop” branch. However, we'll only add features with tests, so make sure everything is green ;)
Thanks to The Climate Corporation and their open source clj-spark project, and to Yieldbot for yieldbot/flambo which served as the starting point for this project.
Copyright (C) 2014-2015 Dr. Christian Betz, and the Gorillalabs team.
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.