Spark SQL + Scala

I have created a Spark Scala program to analysis the population content with Spark SQL API.

The data set is used for this analysis is found in

The spark program which reads the above data set and map it to a table and run a query against it is as given below.

package com.spark

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

object SparkSQLPopulationDataAnalysis {
  def main(args: Array[String]) {
    if (args.length < 1) {
      println("Usage inputFile")
    val inputFile = args(0)
    val conf = new SparkConf().setAppName("SPARK-SQL")
    val sc = new SparkContext(conf)
    //Create the SQL context
    val sqlContext = new SQLContext(sc)
    //Read the data from the JSON file
    val population =
    //Map it to a table
    //Query it
    val allResults = sqlContext.sql("SELECT * FROM population").collect()
    print("Print all records::")
    val queryResults = sqlContext.sql("SELECT age, females, males, total, year FROM population where age IN (1,2,3)").collect()
    print("Query and Print records::")

Please refer for more examples.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s