‘Library Fine’ problem in Hackerrank – solution in Scala

Problem Statement:

The Head Librarian at a library wants you to make a program that calculates the fine for returning the book after the return date. You are given the actual and the expected return dates. Calculate the fine as follows:

If the book is returned on or before the expected return date, no fine will be charged, in other words fine is 0.

If the book is returned in the same month as the expected return date, Fine = 15 Hackos × Number of late days

If the book is not returned in the same month but in the same year as the expected return date, Fine = 500 Hackos × Number of late months

If the book is not returned in the same year, the fine is fixed at 10000 Hackos.

Refer the below link to know more about this problem. https://www.hackerrank.com/challenges/library-fine

Solution in Scala


import java.util.{Calendar, Scanner}

object Solution {

    def main(args: Array[String]) {
       val scanner: Scanner = new Scanner(System.in)

    //Actual Returned Date
    val aDate: Int = scanner.nextInt
    val aMonth: Int = scanner.nextInt
    val aYear: Int = scanner.nextInt

    //Due Date
    val dDate: Int = scanner.nextInt
    val dMonth: Int = scanner.nextInt
    val dYear: Int = scanner.nextInt

    val isValidData: Boolean = ((aDate >= 1 && aDate = 1 && dDate = 1 && aMonth = 1 && dMonth = 1 && aYear = 1 && dYear <= 3000))

    var fineAmount: Int = 0
    if (isValidData) {
      val actualCalendar: Calendar = Calendar.getInstance()
      actualCalendar.set(aYear, aMonth, aDate)

      val dCalendar: Calendar = Calendar.getInstance()
      dCalendar.set(dYear, dMonth, dDate)

      if ((actualCalendar.getTime == dCalendar.getTime) || actualCalendar.getTime.before(dCalendar.getTime)) {
        fineAmount = 0
      }
      else if (actualCalendar.getTime.after(dCalendar.getTime) && aYear == dYear) {
        fineAmount = if ((aMonth == dMonth)) 15 * (aDate - dDate) else 500 * (aMonth - dMonth)
      }
      else {
        fineAmount = 10000
      }
    }
    println(fineAmount)
    }
}

How to use ReflectionUtils to retrieve the field value

In this post, I am going to show how we can use the ReflectionUtils to get the field value from an object.

Most of the time, we use Java Reflection to retrieve the value but if the field is in the Super class, then you have to write some boilerplate code to retrieve those. But if you use ReflectionUtils, then you won’t have to worry about that.

Refer the below example,

BaseProfile and Employee are the two value objects. Here the Employee class extends BaseProfile class which has some common fields. Then in ReflectionUtilsMain, I am using Java reflection and also ReflectionUtils to retrieve the value of “firstName” field

Let’s check the code.

BaseProfile.java


import org.apache.commons.lang.builder.ReflectionToStringBuilder;
import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;


public class BaseProfile {

    private String firstName;

    private String lastName;

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    @Override
    public String toString() {
        return ReflectionToStringBuilder.toString(this);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) return false;

        BaseProfile that = (BaseProfile) o;

        return new EqualsBuilder()
                .append(firstName, that.firstName)
                .append(lastName, that.lastName)
                .isEquals();
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17, 37)
                .append(firstName)
                .append(lastName)
                .toHashCode();
    }
}

BaseProfile.java


import org.apache.commons.lang.builder.ReflectionToStringBuilder;
import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;


public class Employee extends BaseProfile {

    private String empId;

    private String designation;

    public String getEmpId() {
        return empId;
    }

    public void setEmpId(String empId) {
        this.empId = empId;
    }

    public String getDesignation() {
        return designation;
    }

    public void setDesignation(String designation) {
        this.designation = designation;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) return false;

        Employee employee = (Employee) o;

        return new EqualsBuilder()
                .appendSuper(super.equals(o))
                .append(empId, employee.empId)
                .append(designation, employee.designation)
                .isEquals();
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17, 37)
                .appendSuper(super.hashCode())
                .append(empId)
                .append(designation)
                .toHashCode();
    }

    @Override
    public String toString() {
        return ReflectionToStringBuilder.toString(this);
    }
}

Here is our main class
ReflectionUtilsMain.java


import org.apache.commons.lang3.exception.ExceptionUtils;
import org.apache.commons.lang3.reflect.FieldUtils;

import java.lang.reflect.Field;

public class ReflectionUtilsMain {

    public static void main(String[] args) {

        Employee employee = new Employee();
        employee.setEmpId("1234");
        employee.setFirstName("John");
        employee.setLastName("Turner");
        employee.setDesignation("Manager");
        System.out.println(employee);

        //Now you want to access the First Name from Employee object with Java reflection
        String firstName = null;
        String fieldName = "firstName";
        for (Class aClass = employee.getClass(); aClass != null; aClass = aClass.getSuperclass()) {
            System.out.println("aClass:" + aClass.getSimpleName());
            try {
                Field field = aClass.getDeclaredField(fieldName);
                if (!field.isAccessible()) {
                    System.out.println("Field is found "+field);
                    field.setAccessible(true);
                    firstName = (String) field.get(employee);
                    break;
                }
            } catch (NoSuchFieldException | IllegalAccessException e) {
                System.err.print(ExceptionUtils.getStackTrace(e));
            }
        }
        System.out.println("Using Reflection firstName:" + firstName);
        firstName = null;
        //You can also use ReflectionUtils to get this very easily
        try {
            Field field = FieldUtils.getField(employee.getClass(), fieldName, true);
            firstName = (String) field.get(employee);
        } catch (Exception e) {
            System.err.print(ExceptionUtils.getStackTrace(e));
        }
        System.out.println("Using Reflection Utils firstName:" + firstName);

    }
}

The output will be like this,


java.lang.NoSuchFieldException: firstName
	at java.lang.Class.getDeclaredField(Class.java:2070)
	at org.cas.osd.mp.ReflectionUtilsMain.main(ReflectionUtilsMain.java:29)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
org.cas.osd.mp.Employee@42110406[empId=1234,designation=Manager,firstName=John,lastName=Turner]
aClass:Employee
aClass:BaseProfile
Field is found private java.lang.String org.cas.osd.mp.BaseProfile.firstName
Using Reflection firstName:John
Using Reflection Utils firstName:John

Scala: Enum Creation Examples

Enum is used for creating a group of constants like the days of week, colors and etc.

In this post, I am going to show how to create enums in Scala.

There are two ways,
1. Using Scala Enumeration
2. Using Scala Traits

Using Scala Enumeration

Scala has Enumerations class which can be extended to create an Enum. Check the below example.


object ScalaEnumObject {

  def main(args: Array[String]) {
    println("Event:")
    Event.values foreach println
  }
  //Extends Enumeration class
  object Event extends Enumeration {
    type Event = Value
    val CREATE, READ, UPDATE, REMOVE = Value
  }
}

The output looke below,


Event:
CREATE
READ
UPDATE
REMOVE

Using Scala Traits
We can use Trait to create Enums. Trait is similar to Java interface. So we can create a Trait and then create the case object which extends that Trait for each Enum values.

Check the below example.


object ScalaEnumObject {

  def main(args: Array[String]) {
    println("Directions:")
    val directions = List(EAST,WEST,SOUTH,NORTH)
    directions.foreach(direction => println (direction.name))
  }

   //Using Trait
  sealed trait Direction {
    def name: String
  }

  case object EAST extends Direction {
    val name = "E"
  }

  case object WEST extends Direction {
    val name = "W"
  }

  case object SOUTH extends Direction {
    val name = "S"
  }

  case object NORTH extends Direction {
    val name = "N"
  }
}

The output looke below,


Directions:
E
W
S
N

Apache HBase – Java Client API with Docker HBase

HBase is the Hadoop database, a distributed, scalable, big data store. We can use HBase when we need random, realtime read/write access to our Big Data.

I have used the Standalone HBase and Docker HBase for this exercise.

The first step is to install Docker if you dont have it and then do the below steps to install docker HBase.

  1. Refer this repository https://github.com/sel-fish/hbase.docker and follow the instructions available to install Docker HBase.
  2. I have Ubuntu VM hence used my hostname instead of ‘myhbase’. If you have used, the hostname, then you don’t need to update the /etc/hosts file. But make sure to check the /etc/hosts file and verify the below.

    
    <<MACHINE_IP_ADDRESSS>> <<HOSTNAME>>
    
    
  3. My docker run command will be like below.
    
    docker run -d -h $(hostname) -p 2181:2181 -p 60000:60000 -p 60010:60010 -p 60020:60020 -p 60030:60030 --name hbase debian-hbase
    
    
  4. Once you are done, then check the links http://localhost:60010(Master) and http://localhost:60030(Region Server)

pom.xml


<dependency>
  <groupId>org.apache.hbase</groupId>
  <artifactId>hbase-client</artifactId>
  <version>1.3.0</version>
</dependency>

To access the Hbase shell, then follow the below steps,


1. Run 'docker exec -it hbase bash' to enter into the container
2. Go to '/opt/hbase/bin/' folder 
3. Run'./hbase shell' and it will open up the HBase Shell.

You can use the HBase shell available inside the docker container and run scripts to perform all the operations(create table, list, put and scan)


root@HOST-NAME:/opt/hbase/bin# ./hbase shell
2017-02-15 14:55:26,117 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2017-02-15 14:55:27,095 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 1.2.0-cdh5.7.0, r49168a0b3987d5d8b1f1b359417666f477a0618e, Wed Jul 20 23:13:03 EDT 2016

hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 3.0000 average load

hbase(main):002:0> list
TABLE                                                                                                                                                                                         
customer                                                                                                                                                                                      
1 row(s) in 0.0330 seconds

=> ["customer"]
hbase(main):003:0> create 'user','personal'
0 row(s) in 1.2540 seconds

=> Hbase::Table - user
hbase(main):004:0> list
TABLE                                                                                                                                                                                         
customer                                                                                                                                                                                      
user                                                                                                                                                                                          
2 row(s) in 0.0080 seconds

=> ["customer", "user"]
hbase(main):005:0> list 'user'
TABLE                                                                                                                                                                                         
user                                                                                                                                                                                          
1 row(s) in 0.0090 seconds

=> ["user"]
hbase(main):006:0> put 'user','row1','personal:name','bala'
0 row(s) in 0.1500 seconds

hbase(main):007:0> put 'user','row2','personal:name','chandar'
0 row(s) in 0.0110 seconds

hbase(main):008:0> scan 'user'
ROW                                              COLUMN+CELL                                                                                                                                  
 row1                                            column=personal:name, timestamp=1487170597246, value=bala                                                                                    
 row2                                            column=personal:name, timestamp=1487170608622, value=chandar                                                                                 
2 row(s) in 0.0700 seconds

hbase(main):009:0> get 'user' , 'row2'
COLUMN                                           CELL                                                                                                                                         
 personal:name                                   timestamp=1487170608622, value=chandar                                                                                                       
1 row(s) in 0.0110 seconds



The hbase-site.xml will be like this. It will be available in the docker container inder /opt/hbase/conf.

hbase-site.xml


<configuration>
  <property>
    <name>hbase.master.port</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.master.info.port</name>
    <value>60010</value>
  </property>
  <property>
    <name>hbase.regionserver.port</name>
    <value>60020</value>
  </property>
  <property>
    <name>hbase.regionserver.info.port</name>
    <value>60030</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hbase.localcluster.port.ephemeral</name>
    <value>false</value>
  </property>
</configuration>

Create Table



import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class CreateTable {

    public static void main(String... args) throws Exception {
        System.out.println("Creating Htable starts");
        Configuration config = HBaseConfiguration.create();
        //config.set("hbase.zookeeper.quorum", "HOSTNAME");
        //config.set("hbase.zookeeper.property.clientPort","2181");
        Connection connection = ConnectionFactory.createConnection(config);
        Admin admin = connection.getAdmin();
        TableName tableName = TableName.valueOf("customer");
        if (!admin.tableExists(tableName)) {
            HTableDescriptor htable = new HTableDescriptor(tableName);
            htable.addFamily(new HColumnDescriptor("personal"));
            htable.addFamily(new HColumnDescriptor("address"));
            admin.createTable(htable);
        } else {
            System.out.println("customer Htable is exists");
        }
        admin.close();
        connection.close();
        System.out.println("Creating Htable Done");
    }
}

List Tables



import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class ListTable {

    public static void main(String... args) throws Exception {
        Connection connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
        Admin admin = connection.getAdmin();
        HTableDescriptor[] tableDescriptors = admin.listTables();
        for (HTableDescriptor tableDescriptor : tableDescriptors) {
            System.out.println("Table Name:"+ tableDescriptor.getNameAsString());
        }
        admin.close();
        connection.close();
    }
}


Delete Table



import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

import java.io.IOException;

public class DeleteTable {

    public static void main(String... args) {

        System.out.println("DeleteTable Starts");
        Connection connection = null;
        Admin admin = null;

        try {
            connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
            TableName tableName = TableName.valueOf("customer");
            admin = connection.getAdmin();
            admin.disableTable(tableName);
            admin.deleteTable(tableName);
            if(!admin.tableExists(tableName)){
                System.out.println("Table is deleted");
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (admin != null) admin.close();
                if (connection != null) connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        System.out.println("DeleteTable Done");
    }
}

Delete Data



import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class DeleteData {

    public static void main(String... args) throws Exception {
        System.out.println("DeleteData starts");
        Connection connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
        TableName tableName = TableName.valueOf("customer");
        Table table = connection.getTable(tableName);
        Delete delete = new Delete(Bytes.toBytes("row1"));
        table.delete(delete);
        Get get = new Get(Bytes.toBytes("row1"));
        Result result = table.get(get);
        System.out.println("result:"+result);
        if (result.value() == null) {
            System.out.println("Delete Data is successful");
        }
        table.close();
        connection.close();
    }

}

To populate HBase table:


import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class PopulateData {

    public static void main(String... args) throws Exception {

        Connection connection = ConnectionFactory.createConnection(HBaseConfiguration.create());

        TableName tableName = TableName.valueOf("customer");
        Table table = connection.getTable(tableName);

        Put p = new Put(Bytes.toBytes("row1"));
        //Customer table has personal and address column families. So insert data for 'name' column in 'personal' cf
        // and 'city' for 'address' cf
        p.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("bala"));
        p.addColumn(Bytes.toBytes("address"), Bytes.toBytes("city"), Bytes.toBytes("new york"));
        table.put(p);
        Get get = new Get(Bytes.toBytes("row1"));
        Result result = table.get(get);
        byte[] name = result.getValue(Bytes.toBytes("personal"), Bytes.toBytes("name"));
        byte[] city = result.getValue(Bytes.toBytes("address"), Bytes.toBytes("city"));
        System.out.println("Name: " + Bytes.toString(name) + " City: " + Bytes.toString(city));
        table.close();
        connection.close();
    }
}

To scan the tables


import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class ScanTable {

    public static void main(String... args) {
        Connection connection = null;
        ResultScanner scanner = null;
        try {
            connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
            TableName tableName = TableName.valueOf("customer");
            Table table = connection.getTable(tableName);
            Scan scan = new Scan();
            // Scanning the required columns
            scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"));

            scanner = table.getScanner(scan);

            // Reading values from scan result
            for (Result result = scanner.next(); result != null; result = scanner.next())
                System.out.println("Found row : " + result);
            //closing the scanner
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (scanner != null) scanner.close();
            if (connection != null) try {
                connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }


Acceptance testing with Cucumber and Capybara

Cucumber is a software testing tool used for defining the acceptance test cases and running it.

Cucumber itself is written in Ruby Programming language. It uses ‘Gherkin’ language to define the test cases and acceptance tests are written in behavior-driven development style.

Please refer the below links to know more about Cucumber(Features, Step Definitions)

https://cucumber.io/

https://en.m.wikipedia.org/wiki/Cucumber_(software)

Capybara is a library used for simulating the user actions. Please refer https://github.com/teamcapybara/capybara to know more about Capybara.

In this article, I am going to show how we can test the “Yelp” website with Cucumber and Capybara.

My objective is to write an acceptance test which opens up a browser and goes to ‘Yelp’ site and search for a restaurant and validate the results.

Here is the Feature file. It’s written in Gherkin language.

Refer the code at https://github.com/dkbalachandar/ruby-cucumber-test

ruby-cucumber-test/features/yelp.feature

@run
Feature: Search Restaurants
  Scenario: Go to yelp and search for valid restaurant
    Given a user goes to Yelp
    When Search for taco bell
    Then See the List of taco bell Restaurants

   Scenario: Go to yelp and search for restaurant
    Given a user goes to Yelp
    When Search for Qboba
    Then See the List of Qboba Restaurants

   Scenario: Go to yelp and search for restaurant
    Given a user goes to Yelp
    When Search for Chipotle
    Then See the List of Chipotle Restaurants
  
  Scenario: Go to yelp and search for invalid restaurant
    Given a user goes to Yelp
    When Search for hhahsdahsdhasd
    Then See No Results found error message

  Scenario Outline:Go to yelp and search for 	
    Given a user goes to Yelp
    When Search for 	 
    Then See the List of  Restaurants
    Examples:
	    |searchText|
	    |Scardello|
	    |Roti Grill|
	    |Mughlai Restaurant|
	    |Spice In The City Dallas|			


Here is the step definitions file. Its a Ruby file and uses Capybara to simulate the user actions.

ruby-cucumber-test/features/step_definitions/yelp-step.rb


Given(/^a user goes to Yelp$/) do    
  visit "https://www.yelp.com"   
end

When(/^Search for (.*?)$/) do |searchTerm|
  fill_in 'dropperText_Mast', :with => 'Dallas, TX'    
  fill_in 'find_desc', :with => searchTerm
  click_button('submit')
end

Then(/^See the List of (.*?) Restaurants$/) do |searchTerm|  
 expect(page).to have_content(searchTerm)
 expect(page).to have_no_content('No Results')
end

Then(/^See No Results found error message$/) do
 expect(page).to have_content('No Results')
end

This file has all the Env related configurations. I have used ‘Chrome’ as my default browser instead of Firefox. Those configurations can be defined here.

ruby-cucumber-test/features/support/env.rb


require 'capybara/cucumber'
require 'colorize'
require 'rspec'
Capybara.default_driver = :chrome 
Capybara.register_driver :chrome do |app|
   Capybara::Selenium::Driver.new(app, :browser => :chrome)
end

Below is the Gemfile for my program. This file is used for describing gem dependencies for Ruby program.

ruby-cucumber-test/Gemfile


source "https://rubygems.org"

gem "cucumber"
gem "capybara"
gem "selenium-webdriver"
gem "rspec"
gem "chromedriver-helper"

Follow the below steps to run this 


1. Install Bundler (http://bundler.io/): gem install bundler
2. Run bundler: bundle install
3. Start test: cucumber --tag @run

After running the test case, the output will be like below.

cucumber.jpg

Apache Pig

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language used here is Pig Latin. It’s an abstraction layer on top of Map Reduce. So the developer who does not have in depth knowledge of Map Reduce program and the Data analyst can use this platform to analysis the BIG data.

Now let’s see how we are going to use this with some examples.

The first step is to install the Pig. I am using Ubuntu. Follow the below installation steps.

Installation

  1. Download the Pig from Cloudera. I have used the below one. https://archive.cloudera.com/cdh5/cdh/5/pig-0.12.0-cdh5.2.0.tar.gz
  2. Extract out the file pig-0.12.0-cdh5.2.0.tar.gz
  3. Move the extracted folder to /usr/local/pig
    sudo mv pig-0.12.0-cdh5.2.0 /usr/local/pig
  4. Then edit the profile file to update the ENV variables.gedit ~/.bashrc

    Add the below variables,

    
    export PIG_HOME="/usr/local/pig"
    export PIG_CONF_DIR="$PIG_HOME/conf"
    export PIG_CLASSPATH="$PIG_CONF_DIR"
    export PATH="$PIG_HOME/bin:$PATH"
    
  5. Finally, run the profile to reflect the changes.
    source ~/.bashrc

To verify the pig installation, type the below command and check the version and other details.

pig -h

We can run the Pig program either in local mode or Map Reduce mode. If you have the small data set and want to test your script then you can run the pig script in the local mode. Typing the below command will open up a Grunt Shell where we can enter the PigLatin scripts and run it.

Local Mode:

pig -x local

Map Reduce Mode:

pig 

For this exercise, I have used the below datasets. Please download these files and move it to /usr/local/pig folder. I am running all my scripts in local mode.

asthma_adults_stats.csv:

http://raw.githubusercontent.com/dkbalachandar/health-stats-application/master/app/resources/asthma_adults_stats.csv

NationalNames.csv:

https://raw.githubusercontent.com/dkbalachandar/spark-scala-examples/master/src/main/resources/NationalNames.csv

comments.json:


{
    "comments": [
        {
            "text": "test1",
            "time": "1486051170277",
            "userName": "test1"
        },
        {
            "text": "test1",
            "time": "1486051170277",
            "userName": "test1"
        }
    ]
}

We can load the CSV data with or without schema. But when loading JSON you have to specify the schema oherwise it will throw an error.

All the scripts should end with semicolon.

We have lots of operatator and functions. So i am not going to show all those in this example.

To exit from the Grunt shell, you can use ‘quit’ command.

To load the CSV data


data = load '/usr/local/pig/asthma_adults_stats.csv' using PigStorage(','); 
b = foreach data generate $0;
dump b;

To load the data with an actual schema.


data = load '/usr/local/pig/asthma_adults_stats.csv' using PigStorage(',') as (state:chararray, percentage:double); 
b = foreach data generate state, percentage;
dump b;

If you change the field name in the schema and try it, then you will end up with some errors.


data = load '/usr/local/pig/asthma_adults_stats.csv' using PigStorage(',') as (state1:chararray, percentage:double); 
b = foreach data generate state, percentage;
dump b;

 Invalid field projection. Projected field [state] does not exist in schema: state1:chararray,percentage:double.
2017-02-01 15:36:23,942 [main] WARN  org.apache.pig.tools.grunt.Grunt - There is no log file to write to.
2017-02-01 15:36:23,942 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.PlanValidationException: ERROR 1025: 
 Invalid field projection. Projected field [state] does not exist in schema: state1:chararray,percentage:double.
	at org.apache.pig.newplan.logical.expression.ProjectExpression.findColNum(ProjectExpression.java:191)


To load the JSON data


data = load '/usr/local/pig/comments.json' using JsonLoader('comments:{(userName:chararray,text:chararray, time:chararray)}');

Load the CSV data and perform Filter operation


data = load '/usr/local/pig/NationalNames.csv' using PigStorage(',') as (Id: int,
Name:chararray, Year:int, Gender: chararray, Count:int);
filteredData = filter data by Year > 2010;
dump filteredData;

To order the data


data = load '/usr/local/pig/NationalNames.csv' using PigStorage(',') as (Id: int,
Name:chararray, Year:int, Gender: chararray, Count:int);
filteredData = filter data by Year > 2010;
orderByData =  order filteredData by Year;
dump orderByData;

To group the data


data = load '/usr/local/pig/NationalNames.csv' using PigStorage(',') as (Id: int, Name:chararray, Year:int, Gender: chararray, Count:int);
groupByData =  group data by Name;
dump groupByData;

To extract out the Name and Gender


b = foreach data generate Name, Gender;
dump b;

Filter all the Female kid’s names, group the data by Year and then count it


data = load '/usr/local/pig/NationalNames.csv' using PigStorage(',') as (Id: int, Name:chararray, Year:int, Gender: chararray, Count:int);
filterData = filter data by Gender =='F';
groupData = group filterData by Year;
countData = foreach groupData generate group, COUNT($1);
dump countData;

To store the data


store filterData INTO '/tmp/output' USING PigStorage(',');

To describe the relation


 describe filterData;

 Output:
 filterData: {Id: int,Name: chararray,Year: int,Gender: chararray,Count: int}

Motivation Tool: Don’t Break the chain calendar

If you want to change any habit or set a goal, then experts advise us to follow the ‘Don’t break the chain’ calendar tool.

  • Get a ‘Don’t Break the chain’ calendar. You can get it online.
  • Identify the task to be done.
  • Then every day put a mark(X) on Calendar if you perform the identified task successfully.
  • As time goes, by looking at the calendar you will get motivated and don’t want to break the chain so you will do the tasks.

There are lots of IOS and Android applications available. We can use this to achieve our goal.

Habit Chain

Don’t Break The Chain