What is Elasticsearch
May 18, 2021
Bogdan Lemnaru

Elasticsearch is a highly scalable search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. In other words, Elasticsearch is an open source, standalone database server developed in Java. Following an open-core business model, parts of the software are licensed under various open-source licenses (mostly the Apache License), while other parts fall under the proprietary Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.

The Elastic Stack

But Elasticsearch wouldn’t be complete without two or in other cases three more agents: Logstash, Kibana and respectively Filebeat. This combination will fit almost everyone’s needs regarding Elasticsearch. Filebeat is a very lightweight data shipper that sends data into Logstash or directly into Elasticsearch (many other outputs available). After Filebeat sending data to Logstash, the data is split into fields. Let’s assume that Filebeat sends the data of a person’s details in a json format. Logstash can take that data and split all the components in fields and it will be stored in Elasticsearch, nicely and easier to view. Talking about viewing the data inside of Elasticsearch to do that we need, Kibana (which is an easy to setup agent that connects to Elasticsearch) using APIs retrieves data from Elasticsearch.

Backend Components

In the beginning I talked about Elasticsearch having full-text search which is fast due to how it is organized. When you add an event/document to your Elasticsearch, all the distinct words are added into a map with the key being the words and the value being the document number. As documents start pilling up after a while, the most used words would have been already mapped - Elasticsearch adding only the newest document number.

Elasticsearch can form a cluster with different types of nodes, each type having specific tasks or can be formed of nodes that have all the roles (although this is not recommended at all for bigger clusters). The most basic types of nodes are client, master and data. The client handles the request and sends the action to the master or data nodes based on request and then sends back a response to the user. The master node is responsible for lightweight cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. The data nodes hold the shards that contain the documents you have indexed and handle data related operations like CRUD, search, and aggregations whose operations are I/O-,memory-, and CPU-intensive.

Well, now that you know some basic knowledge about how each agent works and what a cluster is composed of, let’s talk about how Elasticsearch stores data. An Elasticsearch cluster is composed of nodes and inside the nodes your data is stored into an index which is composed of shards, each shard working well with 50Gb of data. Ergo, if you have 100Gb of data inside an index, that index will need to have 2 shards (the shards are set by the user, but that’s a discussion for the future), and all the shards are split evenly between the nodes (an index with two shards can have shards split on different nodes).

ELK use cases

The next question you’ll ask might sound like “Ok, but what can we do with all the data that we have stored in Elasticsearch?”, and that’s where Kibana comes handy. If you have logs stored that are sent real time by a Logstash or Filebeat, your support team can search in all the logs by error code or username for the problem they might face in the moment and get down to the problem’s roots fast. Another use case can be after finishing the software development and you want to make some dashboards to see how it’s all going - Kibana is fully capable of doing charts, heatmaps, statistics and tables based on time buckets.

This was a brief explication of ELK. The subject is much vaster with many more features each one well thought with a good use. In this article we have only scratched the surface of Elasticsearch power and use cases, and the variety of business challenges ELK is able to solve. I hope that through this blog post on what is Elasticsearch I was able to clearly explain what is Elasticsearch and its basic components.

Talk to the team