Archive for January 2017
PySpark starter for ten
I just threw this together and I’m putting it here mainly in case I need it later. It might come in handy for others too…
So you have a new Spark installation against a yarn cluster, you want to run something simple on it (akin to hello World) to see if it does anything. Try copying and pasting this into your bash shell:
echo "from pyspark import SparkContext, HiveContext, SparkConf" > sparking.py echo "conf = SparkConf().setAppName('sparking')" >> sparking.py echo 'conf.set("spark.sql.parquet.binaryAsString", "true")' >> sparking.py echo "sc = SparkContext(conf=conf)" >> sparking.py echo "sqlContext = HiveContext(sc)" >> sparking.py echo "l = [('Alice', 1)]" >> sparking.py echo "rdd = sc.parallelize(l)" >> sparking.py echo "for x in rdd.take(10):" >> sparking.py echo " print x" >> sparking.py spark-submit --master yarn --deploy-mode cluster --supervise --name "sparking" sparking.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
echo "from pyspark import SparkContext, HiveContext, SparkConf" > sparking.py | |
echo "conf = SparkConf().setAppName('sparking')" >> sparking.py | |
echo 'conf.set("spark.sql.parquet.binaryAsString", "true")' >> sparking.py | |
echo "sc = SparkContext(conf=conf)" >> sparking.py | |
echo "sqlContext = HiveContext(sc)" >> sparking.py | |
echo "l = [('Alice', 1)]" >> sparking.py | |
echo "rdd = sc.parallelize(l)" >> sparking.py | |
echo "for x in rdd.take(10):" >> sparking.py | |
echo " print x" >> sparking.py | |
spark-submit –master yarn –deploy-mode cluster –supervise –name "sparking" sparking.py | |
If it runs you should see something like this at the end of the yarn log:
Log Type: stdout Log Upload Time: Thu Jan 05 14:56:09 +0000 2017 Log Length: 13 ('Alice', 1)