Dec 21st, 2018: [EN][Elasticsearch] Using SQL to query data in Elasticsearch

What is SQL?

SQL (Structured Query Language) is a standardized programming language that's used to manage relational databases and perform various operations on the data. SQL is regularly used not only by database administrators, but also by developers writing data integration scripts and data analysts looking to set up and run analytical queries.

When SQL is Introduced in Elasticsearch?

Querying data using Structured Querying Language (SQL) has been introduced in Elasticsearch 6.3 as a plugin. One can write SQL queries on the data stored in any Elasticsearch Index.

How can I use/run queries?

There are three ways to do that.

1. Command Line Interface:

Elasticsearch comes with a CLI through which you can write an SQL command. To invoke Elasticsearch SQL CLI interface one should go to /elasticsearch/bin and instantiate elasticsearch-sql-cli to start SQL environment.

2. REST API

Elasticsearch also provides an API endpoint _xpack/sql through which you can pass an SQL command.

3. JDBC & ODBC Clients

These are SQL Driver Clients which you can embed in your application and pass the command.

But a thing to remember, SQL commands can only read the data in Elasticsearch indices.

We will look at executing commands from Kibana “Dev Tools” tab.

Ingesting some data:

Let us download sample data from here. And run below command from downloaded location.

$> curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

It creates an index named “bank”.

Note:

  1. If you are just starting with Elastic Stack and need some guidance, I recommend Getting Started with Elastic Stack.
  2. The above command needs curl to call the REST API, alternatively you can use tools like Postman.

Querying Data in Index:

In Elasticsearch SQL, Index is considered as a Table, Fields are Columns and Documents are rows.

1. Select Query:

 POST /_xpack/sql
 {
     "query": "SELECT * FROM bank";
 }

You can add the format option to the above query and view data in txt format as shown below.

POST /_xpack/sql?format=txt
{
     "query": "SELECT * FROM bank"
}

2. Group by Query

POST /_xpack/sql?format=txt
{
     "query": "SELECT state, avg(balance) FROM bank GROUP BY state"
}

Similarly you can use other SQL commands to query data like DESCRIBE <index_name> etc.

Note: JOINS and Transactions are not supported.

Translating SQL query to Query DSL:

Elasticsearch SQL also has a translate REST endpoint which will translate your SQL query to an equivalent Query DSL that is usually used to query data in Elasticsearch. This is because the underlying SQL Plugin will convert the SQL query to Query DSL equivalent and then passes it to the Elasticsearch Node.

So, SQL feature could also be used as a good tool to learn Query DSL.

POST /_xpack/sql/translate
{
   "query": "SELECT * FROM bank"
}

You can also write complex queries in SQL and translate it to see the result.

GET _xpack/sql/translate
{
   "query": "SELECT avg(balance), state FROM bank WHERE match(gender, 'F') GROUP BY state LIMIT 100"
}

You can refer more documentation about Elasticsearch SQL here.