Apache HBase – Java Client API with Docker HBase

HBase is the Hadoop database, a distributed, scalable, big data store. We can use HBase when we need random, realtime read/write access to our Big Data.

I have used the Standalone HBase and Docker HBase for this exercise.

The first step is to install Docker if you dont have it and then do the below steps to install docker HBase.

  1. Refer this repository https://github.com/sel-fish/hbase.docker and follow the instructions available to install Docker HBase.
  2. I have Ubuntu VM hence used my hostname instead of ‘myhbase’. If you have used, the hostname, then you don’t need to update the /etc/hosts file. But make sure to check the /etc/hosts file and verify the below.

    
    <<MACHINE_IP_ADDRESSS>> <<HOSTNAME>>
    
    
  3. My docker run command will be like below.
    
    docker run -d -h $(hostname) -p 2181:2181 -p 60000:60000 -p 60010:60010 -p 60020:60020 -p 60030:60030 --name hbase debian-hbase
    
    
  4. Once you are done, then check the links http://localhost:60010(Master) and http://localhost:60030(Region Server)

pom.xml


<dependency>
  <groupId>org.apache.hbase</groupId>
  <artifactId>hbase-client</artifactId>
  <version>1.3.0</version>
</dependency>

To access the Hbase shell, then follow the below steps,


1. Run 'docker exec -it hbase bash' to enter into the container
2. Go to '/opt/hbase/bin/' folder 
3. Run'./hbase shell' and it will open up the HBase Shell.

You can use the HBase shell available inside the docker container and run scripts to perform all the operations(create table, list, put and scan)


root@HOST-NAME:/opt/hbase/bin# ./hbase shell
2017-02-15 14:55:26,117 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2017-02-15 14:55:27,095 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 1.2.0-cdh5.7.0, r49168a0b3987d5d8b1f1b359417666f477a0618e, Wed Jul 20 23:13:03 EDT 2016

hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 3.0000 average load

hbase(main):002:0> list
TABLE                                                                                                                                                                                         
customer                                                                                                                                                                                      
1 row(s) in 0.0330 seconds

=> ["customer"]
hbase(main):003:0> create 'user','personal'
0 row(s) in 1.2540 seconds

=> Hbase::Table - user
hbase(main):004:0> list
TABLE                                                                                                                                                                                         
customer                                                                                                                                                                                      
user                                                                                                                                                                                          
2 row(s) in 0.0080 seconds

=> ["customer", "user"]
hbase(main):005:0> list 'user'
TABLE                                                                                                                                                                                         
user                                                                                                                                                                                          
1 row(s) in 0.0090 seconds

=> ["user"]
hbase(main):006:0> put 'user','row1','personal:name','bala'
0 row(s) in 0.1500 seconds

hbase(main):007:0> put 'user','row2','personal:name','chandar'
0 row(s) in 0.0110 seconds

hbase(main):008:0> scan 'user'
ROW                                              COLUMN+CELL                                                                                                                                  
 row1                                            column=personal:name, timestamp=1487170597246, value=bala                                                                                    
 row2                                            column=personal:name, timestamp=1487170608622, value=chandar                                                                                 
2 row(s) in 0.0700 seconds

hbase(main):009:0> get 'user' , 'row2'
COLUMN                                           CELL                                                                                                                                         
 personal:name                                   timestamp=1487170608622, value=chandar                                                                                                       
1 row(s) in 0.0110 seconds



The hbase-site.xml will be like this. It will be available in the docker container inder /opt/hbase/conf.

hbase-site.xml


<configuration>
  <property>
    <name>hbase.master.port</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.master.info.port</name>
    <value>60010</value>
  </property>
  <property>
    <name>hbase.regionserver.port</name>
    <value>60020</value>
  </property>
  <property>
    <name>hbase.regionserver.info.port</name>
    <value>60030</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hbase.localcluster.port.ephemeral</name>
    <value>false</value>
  </property>
</configuration>

Create Table



import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class CreateTable {

    public static void main(String... args) throws Exception {
        System.out.println("Creating Htable starts");
        Configuration config = HBaseConfiguration.create();
        //config.set("hbase.zookeeper.quorum", "HOSTNAME");
        //config.set("hbase.zookeeper.property.clientPort","2181");
        Connection connection = ConnectionFactory.createConnection(config);
        Admin admin = connection.getAdmin();
        TableName tableName = TableName.valueOf("customer");
        if (!admin.tableExists(tableName)) {
            HTableDescriptor htable = new HTableDescriptor(tableName);
            htable.addFamily(new HColumnDescriptor("personal"));
            htable.addFamily(new HColumnDescriptor("address"));
            admin.createTable(htable);
        } else {
            System.out.println("customer Htable is exists");
        }
        admin.close();
        connection.close();
        System.out.println("Creating Htable Done");
    }
}

List Tables



import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

public class ListTable {

    public static void main(String... args) throws Exception {
        Connection connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
        Admin admin = connection.getAdmin();
        HTableDescriptor[] tableDescriptors = admin.listTables();
        for (HTableDescriptor tableDescriptor : tableDescriptors) {
            System.out.println("Table Name:"+ tableDescriptor.getNameAsString());
        }
        admin.close();
        connection.close();
    }
}


Delete Table



import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

import java.io.IOException;

public class DeleteTable {

    public static void main(String... args) {

        System.out.println("DeleteTable Starts");
        Connection connection = null;
        Admin admin = null;

        try {
            connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
            TableName tableName = TableName.valueOf("customer");
            admin = connection.getAdmin();
            admin.disableTable(tableName);
            admin.deleteTable(tableName);
            if(!admin.tableExists(tableName)){
                System.out.println("Table is deleted");
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (admin != null) admin.close();
                if (connection != null) connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        System.out.println("DeleteTable Done");
    }
}

Delete Data



import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class DeleteData {

    public static void main(String... args) throws Exception {
        System.out.println("DeleteData starts");
        Connection connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
        TableName tableName = TableName.valueOf("customer");
        Table table = connection.getTable(tableName);
        Delete delete = new Delete(Bytes.toBytes("row1"));
        table.delete(delete);
        Get get = new Get(Bytes.toBytes("row1"));
        Result result = table.get(get);
        System.out.println("result:"+result);
        if (result.value() == null) {
            System.out.println("Delete Data is successful");
        }
        table.close();
        connection.close();
    }

}

To populate HBase table:


import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class PopulateData {

    public static void main(String... args) throws Exception {

        Connection connection = ConnectionFactory.createConnection(HBaseConfiguration.create());

        TableName tableName = TableName.valueOf("customer");
        Table table = connection.getTable(tableName);

        Put p = new Put(Bytes.toBytes("row1"));
        //Customer table has personal and address column families. So insert data for 'name' column in 'personal' cf
        // and 'city' for 'address' cf
        p.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("bala"));
        p.addColumn(Bytes.toBytes("address"), Bytes.toBytes("city"), Bytes.toBytes("new york"));
        table.put(p);
        Get get = new Get(Bytes.toBytes("row1"));
        Result result = table.get(get);
        byte[] name = result.getValue(Bytes.toBytes("personal"), Bytes.toBytes("name"));
        byte[] city = result.getValue(Bytes.toBytes("address"), Bytes.toBytes("city"));
        System.out.println("Name: " + Bytes.toString(name) + " City: " + Bytes.toString(city));
        table.close();
        connection.close();
    }
}

To scan the tables


import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class ScanTable {

    public static void main(String... args) {
        Connection connection = null;
        ResultScanner scanner = null;
        try {
            connection = ConnectionFactory.createConnection(HBaseConfiguration.create());
            TableName tableName = TableName.valueOf("customer");
            Table table = connection.getTable(tableName);
            Scan scan = new Scan();
            // Scanning the required columns
            scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("name"));

            scanner = table.getScanner(scan);

            // Reading values from scan result
            for (Result result = scanner.next(); result != null; result = scanner.next())
                System.out.println("Found row : " + result);
            //closing the scanner
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (scanner != null) scanner.close();
            if (connection != null) try {
                connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }


Advertisements

Rest API to produce message to Kafka using Docker Maven Plugin

I have developed a simple REST API to send the incoming message to Apache Kafka.

I have used Docker Kafka (https://github.com/spotify/docker-kafka) and the Docker Maven Plugin(https://github.com/fabric8io/docker-maven-plugin) to do this.

So before going through this post be familiarize yourself with Docker and Docker Compose

Docker Maven Plugin[Docker Maven Plugin] provides us a nice way to specify multiple images in POM.xml and link it as necessary. We can also use Docker compose for doing this. But I have used this plugin here.

    1. Clone the project (https://github.com/dkbalachandar/kafka-message-sender)
    2. Then go into kafka-message-sender folder
    3. Then enter ‘mvn clean install’
    4. Then enter  ‘mvn docker:start’. Then enter ‘docker ps’ and make sure that there are two containers are running. The name of those containers are kafka, kafka-rest
    5. Then access http://localhost:8080/api/kafka/send/test (POST) and confirm that you see message has been sent on the browser
    6. Then enter the below command and make sure that whatever message which you sent is available at Kafka[Kafka Command Line Consumer] or you can also consume via a Flume agent[Kafka Flume Agent Consumer]
docker exec -it kafka /opt/kafka_2.11-0.8.2.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

Thread & Heap dumps From a Docker container

Follow the below steps to take the thread and Heap dumps from a docker container

1. Run the below command to bash into the container. Please change the CONTAINER_NAME appropriately

      docker exec -it CONTAINER_NAME bash
    


2. Then type jps to find the all the Java application details and extract the PID for your application

    jps
  

3. Then run the below command to get the thread dump. Please change the PID(Process id) appropriately.

     jstack PID > threadDump.tdump 
     

4. Then run the below command to get the Heapdump. Please change the PID appropriately

        jmap -dump:live,format=b,file=heapDump.hprof PID 
     

5. Then exit from the docker container and download the threadDump.tdump and heapDump.hprof from the docker container by running the below command. Please change the CONTAINER_NAME appropriately

      sudo docker cp CONTAINER_NAME:threadDump.tdump .
      sudo docker cp CONTAINER_NAME:heapDump.hprof .
    

Copy files from host to Docker container and vice versa

Docker command to copy/update the files from host to container

   Command: 
   docker cp LOCALFILE_WITH_PATH CONTAINER_NAME:DEST_PATH
   Example: 
   docker cp config.xml CONTAINER_NAME:/opt/app/config.xml

Docker command to copy the files from container to localhost

   Command:
   docker cp CONTAINER_NAME:SOURCE_PATH LOCALDIRPATH
   Example:
   docker cp CONTAINER_NAME:/opt/app/config.xml .

Replace the CONTAINER_NAME appropriately

To copy an entire folder, then run the below command.

Command:
docker cp SRC_PATH CONTAINER_NAME:DEST_PATH

Example: To Copy the contents of source directory into destination directory(SRC_PATH should end with /.)      

docker cp dist/. CONTAINER_NAME:DEST_PATH:/var/www/html/

Example: To Copy the entire source directory into destination directory(SRC_PATH should not end with /.)    

docker cp dist CONTAINER_NAME:DEST_PATH:/var/www/html/ 
 

How to use Docker Maven plugin

Recently, I have explored on various Docker Maven plugins and used https://github.com/fabric8io/docker-maven-plugin as its somewhat easy. Please note that as of now, this does not have support to integrate with Docker compose file. But it provides a lot of configuration tags by using which we can completely replace all the configurations found in a docker compose file.

Assume that we have two Restful services running on port number 8080, 8081 respectively. Both are entirely different. The first Restful application runs on 8080 access the second Restful application running on port number 8081. Refer below the snippet of pom.xml which uses the docker maven plugin


<build>
    <plugins>
        <plugin>
            <groupId>io.fabric8</groupId>
            <artifactId>docker-maven-plugin</artifactId>
            <version>0.15.1</version>
            <configuration>
                <images>
                    <!-- Application 1 -->
                    <image>
                        <name>middleware-rest</name>
                        <alias>middleware-rest</alias>
                        <build>
                            <cleanup>true</cleanup>
                            <tags>
                                <tag>latest</tag>
                            </tags>
                            <ports>
                                <port>8080</port>
                            </ports>
                            <dockerFileDir>middleware-rest</dockerFileDir>
                            <assembly>
                                <mode>dir</mode>
                                <inline xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd">
                                    <id>middleware-rest</id>
                                    <files>
                                        <file>
                                            <source>${project.build.directory}/mwRest-${project.version}-appjar.jar</source>
                                            <outputDirectory>./</outputDirectory>
                                            <destName>mwRest.jar</destName>
                                        </file>
                                    </files>
                                </inline>
                            </assembly>
                        </build>
                        <run>
                            <namingStrategy>none</namingStrategy>
                            <ports>
                                <port>8080:8080</port>
                            </ports>
                            <wait>
                                <tcp>
                                    <ports>
                                        <port>8080</port>
                                    </ports>
                                </tcp>
                                <time>60000</time>
                            </wait>
                            <env>
                                <CLIENT_MW_URL>http://client-middleware-rest:8080</CLIENT_MW_URL>
                            </env>
                            <links>
                                <link>client-middleware-rest:client-middleware-rest</link>
                            </links>
                            <log>
                                <enabled>true</enabled>
                                <color>red</color>
                                <driver>
                                    <name>json-file</name>
                                    <opts>
                                        <max-size>10m</max-size>
                                        <max-file>5</max-file>
                                    </opts>
                                </driver>
                            </log>
                        </run>
                    </image>
                    <!-- Application 2  -->
                    <image>
                        <name>client-middleware-rest</name>
                        <alias>client-middleware-rest</alias>
                        <build>
                            <cleanup>true</cleanup>
                            <tags>
                                <tag>latest</tag>
                            </tags>
                            <ports>
                                <port>8081</port>
                            </ports>
                            <dockerFileDir>client-middleware-rest</dockerFileDir>
                            <assembly>
                                <mode>dir</mode>
                                <inline xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd">
                                    <id>client-middleware-rest</id>
                                    <files>
                                        <file>
                                            <source>${project.build.directory}/clientmwRest-${project.version}-appjar.jar</source>
                                            <outputDirectory>./</outputDirectory>
                                            <destName>clientmwRest.jar</destName>
                                        </file>
                                    </files>
                                </inline>
                            </assembly>
                        </build>
                        <run>
                            <namingStrategy>none</namingStrategy>
                            <ports>
                                <port>8081:8081</port>
                            </ports>
                            <wait>
                                <tcp>
                                    <ports>
                                        <port>8081</port>
                                    </ports>
                                </tcp>
                                <time>60000</time>
                            </wait>
                            <log>
                                <enabled>true</enabled>
                                <color>red</color>
                                <driver>
                                    <name>json-file</name>
                                    <opts>
                                        <max-size>10m</max-size>
                                        <max-file>5</max-file>
                                    </opts>
                                </driver>
                            </log>
                        </run>
                    </image>
                </images>
            </configuration>
            <executions>
                <execution>
                    <id>start</id>
                    <phase>pre-integration-test</phase>
                    <goals>
                        <goal>build</goal>
                        <goal>start</goal>
                    </goals>
                </execution>
                <execution>
                    <id>stop</id>
                    <phase>post-integration-test</phase>
                    <goals>
                        <goal>stop</goal>
                    </goals>
                </execution>
                <execution>
                    <id>remove</id>
                    <phase>post-integration-test</phase>
                    <goals>
                        <goal>stop</goal>
                        <goal>remove</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

So once you have this configuration, then we can build the containers by running mvn docker:build

To start all the containers, run mvn docker:start
To stop all the containets, run mvn docker:remove

For further information, Please access https://github.com/fabric8io/docker-maven-plugin

Also refer my other post to know more about this Rest API to produce message to Kafka using Docker Maven Plugin

DockerFile for Apache2


FROM ubuntu:16.04

RUN apt-get update && apt-get install -y apache2 && rm -rf /var/www/html/*

ENV APACHE_RUN_USER www-data && APACHE_RUN_GROUP www-data && APACHE_LOG_DIR 
/var/log/apache2

EXPOSE 80

CMD ["/usr/sbin/apachectl", "-D", "FOREGROUND"]


Docker Custom container Name

We use Docker compose for multi container application. After starting all the containers, then if you run the docker ps, you will be able to see all the containers and its images and port details.

If you note down the container name, its something generated by Docker. If we want to use the custom name, then we should specify it as container_name in the Docker compose file. Refer the below example,


rest:
   image: rest_image:1.0
   container_name: mywebApp-rest 
   environment:
      log_driver: "json-file"
   ports:
     - 8081:8081
   log_opt:
      max-size: "10m"
      max-file: "5"    

web:    
    image: web_image:1.0
    container_name: mywebApp-web
    environment:
     - REST_URL=http://rest:8081
    log_driver: "json-file"
    ports:
      - 80:80
    log_opt:
       max-size: "10m"
       max-file: "5"
    links:
         - rest:rest

If you use docker-maven-plugin[https://github.com/fabric8io/docker-maven-plugin], then you have to specify the alias inside the tag. Specify it as below


 <run>
  <namingStrategy>alias</namingStrategy>  
  ...
  ...
   </run>