Apache Solr 6.3.0 – How to Load CSV and do Search

In this post, I am going to show three things,

  • How to Install the Apache solr 6.3.0 on Ubuntu
  • How to load a CSV file which contains US baby names information
  • How to query the data with the Solr API

To install Apache Solr, Follow the below steps

sudo wget http://apache.claz.org/lucene/solr/6.3.0/solr-6.3.0.tgz
sudo gunzip solr-6.3.0.tgz
sudo tar -xvf  solr-6.3.0.tar

Go into the folder solr-6.3.0 and open up a terminal and then type below command to start the Solr server

 bin/solr start

Check the solr admin console with this link http://localhost:8983/solr

Then the next step is to create the collections and load the CSV data.

bin/solr create -c  babynames

Once we create the collections, then we have to specify the field definitions in the schema file. The schema file is available under the /server/solr/babynames/conf/ folder. managed-schema is the schema file name. You can rename this to schema.xml. But i just keep it as it and add the below fields in that file

  <field name="Count" type="int" indexed="true" stored="true"/>
  <field name="Gender" type="string" indexed="true" stored="true"/>
  <field name="Id" type="int" indexed="false" stored="false"/>
  <field name="Name" type="text_general" indexed="true" stored="true"/>
  <field name="Year" type="int" indexed="true" stored="true"/>

Then load the CSV file with the below command. I have used this file https://github.com/dkbalachandar/spark-scala-examples/blob/master/src/main/resources/NationalNames.csv for this exercise

bin/solr post -c babynames NationsNames.csv

Finally, I query the data with the Solr REST API.

To search with Name: http://localhost:8983/solr/babynames/select?q=Name:%22Mary%22
To search with Gender : http://localhost:8983/solr/babynames/select?q=Gender:%22M%22
To search with year range: http://localhost:8983/solr/babynames/select?q=*&fq=Year:%5B1880%20TO%201890%5D

Refer below the screenshots taken.