How to get data from SQL Server to Elasticsearch using LogStash

As a developer working with SQL Server there was a need to import data from the database to Elasticsearch and analyze data in Kibana.

As Elasticsearch is an open-source project built with Java and handles most other open-source projects, documentation on importing data from SQL Server to ES using LogStash.

I’d like to share how to import SQL Server data to Elasticsearch (version 6.2) using LS and verify the result on Kibana.

Assumption

I will skip on installing ELK (ElasticSearch, LogStash, and Kibana) stack as it’s outside the scope of this article.
Please refer to installation steps on Elastic download pages.

Overview

Here are the steps required to import SQL Server data to Elasticsearch.

  1. Install Java Development Kit (JDK)
  2. Install JDBC Driver for SQL Server
  3. Set CLASSPATH for the driver
  4. Create an Elasticsearch Index to Import Data to
  5. Configure LogStash configuration file
  6. Run LogStash
  7. Verify in Kibana

Step 1 – Install Java SE Development Kit 8

One of the gotchas is that you might install the latest version of JDK, which is version 9 but Elasticsearch documentation requires you to install JDK 8.

At the time of writing, the latest JDK 8 version is 8u162, which can be downloaded here.

Download “JDK8 8u162” and install it on your machine and make sure that “java” is in the PATH variable so that it can be called in any directory within a command line.

Step 2 – Install JDBC Driver for SQL Server

You need to download and install Microsoft JDBC Driver 4.2 for SQL Server, not the latest version.

As Elasticsearch is built with JDK 8, you can’t use the latest version of JDBC Driver (version 6.2) for SQL Server as it does not support JDK 8.

Step 3 – Set CLASSPATH for the JDBC Driver

We need to set the path so that Java can find the JDBC driver.

📝 Note: I am working on Windows 10 machine.

1. Go to the directory under which you have installed SQL Server JDBC.

2. Now you need to navigate to find a JAR file named sqljdbc42.jar, which is found under<<JDBC installation folder>>\sqljdbc_4.2\enu\jre8

3. And then copy the full path to the JAR file.

A cool trick on Windows 7/8/10 is that, when shift+right click on a file, it gives you a “Copy as Path” option.

4. Go to Windows Start button and type “environment” and click on “Edit the system environment variables”.

5. Add a CLASSPATH environment variable with following values (if you don’t already have one).

  1. “.” – for the current directory to search.
  2. And the JAR file path copied in previously (e.g. “C:\talih\Java\MicrosoftJDBCDriversSQLServer\sqljdbc_4.2\enu\jre8\sqljdbc42.jar”).

Gotcha: If you have a space in the path for JDBC JAR file, make sure to put double quotes around it.

Not doing so will result in either of following error messages when you start LogStash service in later step. 

c:\talih\elasticco\logstash-6.2.2>bin\logstash -f sql.conf

Error: Could not find or load main class JDBC

 - Or -

c:\talih\elasticco\logstash-6.2.2>bin\logstash -f sql.conf

Error: Could not find or load main class File\Microsoft

Let’s now move onto to create an Elasticsearch index to import data to.

Step 4 – Create an Elasticsearch Index to Import Data to

You can use cURL or Postman to create an Index but I will use Kibana console to create an index named “cs_users”, which is equivalent to a database in relational database terminology.

Before we start the Kibana service, we need to start Elasticsearch so that Kibana would not whine about Elasticsearch not being present.

Kibana warnings on lines 12~21 due to Elasticsearch being unavailable

Go to the Elasticsearch installation and start the service.

talih@CC c:\talih\elasticco\elasticsearch-6.2.2
> bin\elasticsearch.bat

And then go to the Kibana installation directory to start Kibana service.

talih@CC c:\talih\elasticco\kibana-6.2.2-windows-x86_64 
> bin\kibana.bat

If Kibana started without an issue, you will see an output similar to the following.

Kibana started successfully

On line 9, Kibana reports that it is running on http://localhost:5601.
Open the URL in a browser of your choice.

Now go to “Dev Tools” link on the bottom left of the page.

Click on Kibana Dev Tools Link

Once you see the Console, create a new index with the following command.

PUT cs_users
{
        "settings" : {
              "index" : {
                      "number_of_shards" : 3,
                      "number_of_replicas" : 0
              }
        }
}

on the left panel of the Kibana Dev Tools Console.

Create a new Elasticsearch index named “cs_users”

I won’t go into details on “shards” and “replicas” since it’s outside the scope of this article. For more information on the syntax, refer to the official Elasticsearch documentation.

And you will see the response from Elasticsearch with index creation confirmation on the panel right.

A new index “cs_users” is created on Elasticsearch successfully

OK, now we are finally ready to move onto creating a configuration file for LogStash to actually import data.

Step 5 – Configure LogStash configuration file

Go to the LogStash installation folder and create a file named “sql.conf” (name doesn’t really matter).
Here is the LogStash configuration I will be using.

input {
  jdbc {
    jdbc_connection_string => "jdbc:sqlserver://cc:1433;databaseName=StackExchangeCS;integratedSecurity=true;"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_user => "xxx"

    statement => "SELECT * FROM Users"
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "cs_users"
  }
}

Let me break down “input” and “output” configurations.

Input

There are three required fields you need to specify for “jdbc” input plugin.

jdbc_connection_string – This field instructs LogStash information on SQL Server.

"jdbc:sqlserver://cc:1433;databaseName=StackExchangeCS;integratedSecurity=true;"

Elasticsearch will connect to the server named “cc” running on port 1433 to connect to a database named “StackExchangeCS” with integrated security authentication method.

jdbc_driver_class – This is the driver class contained within the JDBC JAR file.
The JDBC JAR file contains a driver of type “com.microsoft.sqlserver.jdbc.SQLServerDriver” according to the documentation.

If you have an inquisitive mind, you can confirm it by opening the JAR file with your choice of ZIP program as JAR is a simple ZIP file.

Unzip JAR to verify JDBC driver name

jdbc_user – If you are using “Integrated Security” as an authentication option, this can be any string (I just entered “xxx” since that’s the easiest thing I can type 😉).

Output

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "cs_users"
  }
}

SQL Server data (all cs.stackexchange.com users) will be sent to Elasticsearch running on the local machine port 9200 and will be indexed under “cs_users” index created in “Step 4 – Create an Elasticsearch Index to Import Data to”.
There are quite a bit of Elasticsearch configuration options so please refer to the official LogStash documentation for more “elasticsearch” output plugin options.

Step 6 – Import Data with LogStash

With prerequisites out of the way, we are now ready to import data to Elasticsearch from SQL Server.
Go to the LogStash installation location under which you should have created “sql.conf” and run LogStash service.

bin\logstash -f sql.conf

-f flag specifies the configuration file to use.
In our case, “sql.conf” we created in the previous step.

The result of successful LogStash run will look similar to the following output.

Step 7 – Verify in Kibana

Wow, we have finally imported data. Now let’s do a quick check whether the number of records in the database matches the records in Elasticsearch.

Verifying result of data import

“User” table in the SQL Server has 59394 records and Elasticsearch returns the same number as well.
📝 Note: You can use following command to get the list of all records in “cs_users” index.

GET cs_users/_count

For more information on how “_count” works, refer to Count API documentation.

Conclusion

Congratulations for getting this far 👏👏👏.

How to install scala and create a class on Win & Linux

1 – Verify the JDK installation on your machine. Open the shell/terminal and type java -version and javac -version.

2 – Download Scala Binaries from http://www.scala-lang.org/download/. As of writing this post Scala version is 2.11.6, so you should be getting downloaded file as scala-2.11.6.tgz. Unzip the scala-2.11.6.tgz file using the following command as shown below.

3 – tar -xvzf scala-2.11.6.tgz

4 – After unzipping, change the path to point to the directory using cd command as shown below.

5 – For instance my directory is Downloads in which Scala binaries are unzipped.

6 – Now we are in the downloads directory where Scala binaries are present. Just go to the bin directory.

7 – cd scala-2.11.6 / cd bin

8 – This is the Scala REPL shell in which we can type programs and see the outcome right in the shell.

Scala Hello World Example

class Student() {
var id:Int = 0
var age:Int = 0
def studentDetails(i:Int,a:Int) {
id = i
age = a
println(“Student Id is :”+id);
println(“Student Age is :”+age);
}
}

Output: defined class Student

Here we create a Student class and print the student details in the studentDetails method by passing student id and age as parameter. If there are no errors in the code then a message “defined class Student” is displayed.

Create the student object and invoke the studdetails method by passing the student id and age.

object Stud {
def main(args:Array[String]) {
val stu = new Student();
stu.studentDetails(10,8);
}
}

Returns: defined object Stud