Elasticsearch - Comprehensive Brief


(Teju Nc) #1

Elasticsearch is a real-time distributed and open source full-text search and analytics engine. It is used in Single Page Application (SPA) projects. Elasticsearch is open source developed in Java and used by many big organizations around the world. It is licensed under the Apache license version 2.0.

Now with Elasticsearch we can achieve the speed we would like, as it lets us index millions of documents. But what’s the use of indexing our documents if we can’t find the one we are looking for just as quickly? Well as we will see, Elasticsearch can perform queries across all those millions of documents and return accurate results in a fraction of a second.

Secondly, searching relevancy and score, where in a typical relational SQL database you may try to write code as follows:

  SELECT POST_CONTENT FROM BLOG WHERE POST LIKE ‘%something%’;

Which is sort of giving us what we want, while Elasticsearch has sophisticated query techniques that will allow us to apply scoring and relevance by a simple rest call.

Example of a query in Elasticsearch:

GET blog/_search
{
  "query": {
    "match": { 
      "post": "something"
    }
  }
}

Finally Elastic search offers statistical analysis tools, which allows us to see trends in our data.

How Elasticsearch saves data?

Elasticsearch does not have tables, and a schema is not required. Elasticsearch stores data documents that consist of JSON strings inside an index.

{
    "id": 1,
    "firstName": "Alexandra",
    "lastName": "Hamilton"
    "isActive": false,
    "balance": "2,815.91",
    "age": 35,
    "eyeColor": "green"
}

The field is like the columns of the SQL database and the value represents the data in the row cells.

When you save a document in Elasticsearch, you save it in an index. An index is like a database in relational database. An index is saved across multiple shards and shards are then stored in one or more servers which are called nodes, multiple nodes form a cluster.

How Elasticsearch represents data

In Elasticsearch, a Document is the unit of search and index.

An index consists of one or more Documents, and a Document consists of one or more Fields.

In database terminology, a Document corresponds to a table row, and a Field corresponds to a table column.

Schema
Unlike Solr, Elasticsearch is schema-free. Well, kinda.

Whilst you are not required to specify a schema before indexing documents, it is necessary to add mapping declarations if you require anything but the most basic fields and operations.

This is no different from specifying a schema!

The schema declares:

  • what fields there are
  • which field should be used as the unique/primary key
  • which fields are required
  • how to index and search each field

In Elasticsearch, an index may store documents of different "mapping types". You can associate multiple mapping definitions for each mapping type. A mapping type is a way of separating the documents in an index into logical groups.

To create a mapping, you will need the Put Mapping API, or you can add multiple mappings when you create an index.

Why would I want to use Elasticsearch?

Elasticsearch can be used for various usage, for example it can be used as a blog storage engine in case you would like your blog to be searchable. Traditional SQL doesn’t readily give you the means to do that. Install of Elasticsearch for real-time experience.

How about Analytics tools?

Most software generates tons of data that is worth analyzing, Elasticsearch comes with Logstash and Kibana to give you a full analytics system.
Finally, I like to see Elasticsearch as Dataware house, where you have documents with many different attributes and non-predictable schemas. Since Elasticsearch is schemaless, it won’t matter that you store various documents there, you will still be able to search them easily and quickly.

Key features:

  • Fast, Incisive Search against Large Volumes of Data
  • Indexing Documents to the Repository
  • Denormalized Document Storage: Fast, Direct access to your Data
  • Broadly Distributable and Highly Scalable

Elasticsearch – Advantages

  • Developed on Java, which makes it compatible on almost every platform.
  • After one second the added document is searchable in this engine.
  • Distributed - which makes it easy to scale and integrate in any big organization.
  • Creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.
  • Handling multi-tenancy is very easy in when compared to Apache Solr.
  • Uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
  • Supports almost every document type except those that do not support text rendering.

For additional information on Elasticsearch. Please check MindMajix/Tekslate to get more in-depth knowledge on this topic.


(Russ Cam) #2

The Definitive Guide online is a great resource to start with, as well as the reference and other guides.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.