Spark SQL + Scala

I have created a Spark Scala program to analysis the population content with Spark SQL API.

The data set is used for this analysis is found in

The spark program which reads the above data set and map it to a table and run a query against it is as given below.

package com.spark

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

object SparkSQLPopulationDataAnalysis {
  def main(args: Array[String]) {
    if (args.length < 1) {
      println("Usage inputFile")
    val inputFile = args(0)
    val conf = new SparkConf().setAppName("SPARK-SQL")
    val sc = new SparkContext(conf)
    //Create the SQL context
    val sqlContext = new SQLContext(sc)
    //Read the data from the JSON file
    val population =
    //Map it to a table
    //Query it
    val allResults = sqlContext.sql("SELECT * FROM population").collect()
    print("Print all records::")
    val queryResults = sqlContext.sql("SELECT age, females, males, total, year FROM population where age IN (1,2,3)").collect()
    print("Query and Print records::")

Please refer for more examples.


