Spark SQL + Scala

I have created a Spark Scala program to analysis the population content with Spark SQL API.

The data set is used for this analysis is found in https://github.com/dkbalachandar/spark-scala-examples/blob/master/src/main/resources/population.json

The spark program which reads the above data set and map it to a table and run a query against it is as given below.



package com.spark

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

object SparkSQLPopulationDataAnalysis {
  def main(args: Array[String]) {
    if (args.length < 1) {
      println("Usage inputFile")
    }
    val inputFile = args(0)
    val conf = new SparkConf().setAppName("SPARK-SQL")
    val sc = new SparkContext(conf)
    //Create the SQL context
    val sqlContext = new SQLContext(sc)
    //Read the data from the JSON file
    val population = sqlContext.read.json(inputFile)
    //Map it to a table
    population.registerTempTable("population")
    //Query it
    val allResults = sqlContext.sql("SELECT * FROM population").collect()
    print("Print all records::")
    allResults.foreach(println)
    val queryResults = sqlContext.sql("SELECT age, females, males, total, year FROM population where age IN (1,2,3)").collect()
    print("Query and Print records::")
    queryResults.foreach(println)
    sc.stop()
  }
}

Please refer https://raw.githubusercontent.com/dkbalachandar/spark-scala-examples for more examples.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s