For example, if you ran…
$> ./bin/run-example LogQuery
… The run-example script finds the example class passed in the command args, then invokes spark-submit, which intern invokes spark-class, which finds/creates a host and executes the program.
When running custom Scala Scripts you need to package your classes and dependencies in to a jar, then use spark-submit to execute….
$>./spark/bin/spark-submit --class ScalaScriptClass ./path/to/project/target/scala-x.xx/scalascript.jar
Machine Learning Library (MLlib) Programming Guide
The Spark MlLib docs also serves as a nice curriculum break down for studying all the common machine learning techniques
- Download spark
- Extract archive
- View archive README. Note we’ll use sbt to build spark, Scala must be pre-installed, and SCALA_HOME environment var needs to be set.
[UPDATED] Looks this video is outdated. The latest spark version (1.5.2) uses maven to build. Install Java and Maven.
- Start spark build
spark-archive-home$> sbt/sbt package
[UPDATED] Run Maven build command…
spark-archive-home$> build/mvn -DskipTests clean package
… I did run into consistent errors during this step, which think it has to do with mis-matching java versions. I gave up after a while and just downloaded the pre-compiled spark binaries instead
- Download & extract Scala required version (from spark source README)
[UPDATED] looks like maven takes care of building scala source and it already comes working with the pre-compiled binary distros.
- Set SCALA_HOME env var by creating a conf/spark-env.sh file using the spark distro’s conf/spark-env.sh.template file as a template (as described in spark distro README) OR edit your user .profile file to export SCALA_HOME env var with correct scala exec path.
$> cp conf/spark-env.sh.template conf/spark-env.sh $> vi conf/spark-env.sh $> export SCALA_HOME=/opt/spark-1.5.2-bin-hadoop2.4 ##add this line inside spark-env.sh
[UPDATED] Skipped SCALA_HOME env var step. Looks like it’s unnecessary.
- Log4j logging level setup by using sparks log4j template….
$> cp conf/log4j.properties.template conf/log4j.properties $> vi conf/log4j.properties log4j.rootCategory=ERROR, console ##edit this line inside log4j.properties
- Start spark shell…
- Open Spark Quick Start guide and walk through scala with spark examples.
- Spark docs at spark project site. You can select specific versions of documentation
- Free spark project curriculums at Berkeley Amp Camp. Covers implementation of more complex apps and deployments.
- Walks through Quick Start guide using scala transformations and caching examples
- Walks through Quick Start guide example building and running a stand alone application