Netflix.PigPen

https://github.com/Netflix/PigPen.git

git clone 'https://github.com/Netflix/PigPen.git'

(ql:quickload :Netflix.PigPen)
509

PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig or Cascading but you don't need to know much about either of them to use it.

Getting Started, Tutorials & Documentation

Getting started with Clojure and PigPen is really easy.

Note: It is strongly recommended to familiarize yourself with Clojure before using PigPen.

Note: PigPen is not a Clojure wrapper for writing Pig scripts you can hand edit. While entirely possible, the resulting scripts are not intended for human consumption.

Questions & Complaints

Artifacts

pigpen is available from Maven:

With Leiningen:

;; core library
[com.netflix.pigpen/pigpen "0.3.3"]

;; pig support
[com.netflix.pigpen/pigpen-pig "0.3.3"]

;; cascading support
[com.netflix.pigpen/pigpen-cascading "0.3.3"]

;; rx support
[com.netflix.pigpen/pigpen-rx "0.3.3"]

The platform libraries all reference the core library, so you only need to reference the platform specific one that you require and the core library should be included transitively.

Note: PigPen requires Clojure 1.5.1 or greater

Parquet

To use the parquet loader, add this to your dependencies:

[com.netflix.pigpen/pigpen-parquet-pig "0.3.3"]

Here an example of how to write parquet data.

(require '[pigpen.core :as pig])
(require '[pigpen.parquet :as pqt])

;;
;; assuming that `data` is in tuples
;;
;; [["John" "Smith" 28]
;;  ["Jane" "Doe"   21]]

(defn save-to-parquet
  [output-file data]
  (->> data
       ;; turning tuples into a map
       (pig/map (partial zipmap [:firstname :lastname :age]))
       ;; then storing to Parquet files
       (pqt/store-parquet
        output-file
        (pqt/message "test-schema"
                     ;; the field names here MUST match the map's keys
                     (pqt/binary "firstname")
                     (pqt/binary "lastname")
                     (pqt/int64  "age")))))

And how to load the records back:

(defn load-from-parquet
  [input-file]
  ;; the output will be a sequence of maps
  (pqt/load-parquet
   input-file
   (pqt/message "test-schema"
                (pqt/binary "firstname")
                (pqt/binary "lastname")
                (pqt/int64  "age"))))

And check out the pigpen.parquet namespace for usage.

Note: Parquet is currently only supported by Pig

Avro

To use the avro loader (alpha), add this to your dependencies:

[com.netflix.pigpen/pigpen-avro-pig "0.3.3"]

And check out the pigpen.avro namespace for usage.

Note: Avro is currently only supported by Pig

Release Notes