Git: Pushing remote branch to Gerrit

If you use the Gerrit as your code review tool, then follow the below steps for pushing the remote branch to Gerrit.

1. Create a local branch to track for the Remote Branch. Run the below command for the same. 


2. Then try to merge your local branch with master branch to avoid any conflicts. This will be helpful when we merge the master with REMOTE_BRANCH_NAME in the future.

 git merge origin/master 

3. Make the necessary changes in the code and run the below commands to add and commit it. 

 git add . 
 git commit -m "COMMIT_MESSAGE" 

4. Run the below command to push the code to gerrit for the remote branch 

git push gerrit LOCAL_BRANCH_NAME:refs/for/REMOTE_BRANCH_NAME 

How to merge local branch with master[Git]

In this post, I am going to show the step by step instructions for merging our local branch with the master.

Please follow the below steps,

1. Go into your GitHub repository or your project repository
2. Create a local branch with the below command,

git branch local-branch

3. Then move into that branch

git checkout local-branch

4. Then make the relevant changes and issue the below commands to commit it

  git add .
  git commit -m "commit message"

5. Then move back to master branch and merge it as below

  git checkout master
  git merge local-branch
  git push origin master

If you see any conflicts while doing push, then do rebase and then push.

Spark Scala Unit Testing

In this post, I am going to show an example for writing unit test cases for Spark Scala job and run it with Maven.

Assume that we have a set of XML files which has user information like first name, last name and etc. Assume that middle name and county name are optional fields but the XML file does contain empty nodes for these two fields. So now our job is to read those files and remove those empty nodes and output those updated content into a text file either in local env or hadoop env.

The sample XML content is given below,




The Spark scala code for reading XML files and removing the empty nodes are given below.

package com

import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.Map

object EmptyTagReplacer {

  def main(args: Array[String]) {

    if (args.length < 2) {
      println("Usage <inputDir> <outputDir>")
    val conf = new SparkConf().setAppName("EmptyTagReplacer")
    val sc = new SparkContext(conf)

    val inFile = args(0)
    val outFile = args(1)

    val input: Map[String, String] = sc.wholeTextFiles(inFile).collectAsMap()
    searchAndReplaceEmptyTags(sc, input, outFile)

  def searchAndReplaceEmptyTags(sc: SparkContext, inputXml: Map[String, String], outFile: String):
  scala.collection.mutable.ListBuffer[String] = {

    var outputXml = new scala.collection.mutable.ListBuffer[String]()
    val htmlTags = List("<middleName/>", "<countyName/>")
    inputXml.foreach { case (fileName, content) =>
      var newContent = content
      for (tag  <- htmlTags) {
        val data = sc.parallelize(newContent)
        data.saveAsTextFile(outFile + "/" + fileName)
      outputXml += newContent

  def countTags(sc: SparkContext, xmlRecords: List[String]): List[Int] = {

    var middleNameTagCounter = sc.accumulator(0)
    var countyTagCounter = sc.accumulator(0)
    val middleNameRegex = "<middleName/>".r
    val countyRegEx = "<countyName/>".r
    xmlRecords.foreach { content =>
      middleNameTagCounter += middleNameRegex.findAllIn(content).length
      countyTagCounter += countyRegEx.findAllIn(content).length
    List(middleNameTagCounter.value, countyTagCounter.value)

Now the test case for testing the above spark job is given below,

package com


import com.holdenkarau.spark.testing.SharedSparkContext
import org.scalatest.FunSuite
import collection.mutable.Map


class EmptyTagReplacerTest extends FunSuite with SharedSparkContext {

  test("Empty HTML tag replacer test") {

    //Read the content and create a content Map.
    //val content: String ="./src/test/resources/text-files/xml1").mkString
    val content: String =  FileUtils.readFileToString(new File("./src/test/resources/text-files/xml1"), "UTF-8")

    val contentMap = collection.mutable.Map[String, String]()
    contentMap.+=("fileName" -> content)
    //Call searchAndReplaceMethod to remove empty Nodes
    val outputContent: scala.collection.mutable.ListBuffer[String] = EmptyTagReplacer.searchAndReplaceEmptyTags(sc, contentMap, "")
    val counts: List[Int] = EmptyTagReplacer.countTags(sc, outputContent.toList)
    val expected = List(0, 0)
    assert(counts == expected)

You have to include the scala-maven-plugin and scalatest-maven-plugin in pom.xml to make this work.

Please refer my github repo to know more

Remove Git Repository History

Please follow the below steps to completely remove the commit history

1. Clone the Git repository
2. Run rm -rf .git to remove all the .git files
3. Then run the below command to initialize the repo again and commit the changes

git init
git add .
git commit -m "Commit message"

4. Push the changes to Git

git remote add origin REPO_URL
git push -u --force origin master