Elasticsearch 7.13 is there: Software Engineer, Get to Know the Giant of Searching and Analytics.

Akintola L. F. ADJIBAO
6 min readJun 18, 2021

6 min to learn all the theory you need, as a software or data engineer, about Elasticsearch, the future of searching, and analytics.

Elasticsearch, the Giant — Photo by Mick Haupt on Unsplash

Have you just heard about Elasticsearch and want to learn more about it? Or are you looking for a new position that requires Elasticsearch skills? Or maybe your boss or your teacher came up with Elasticsearch as the new technology to learn for your next project. You’re in the right place!

It is a fact, not so many engineers like pure theory. But like Karl Max said: “Practice without Theory is blind, Theory without practice is sterile.

In this article, we’re going to learn in a short and practical way, what is Elasticsearch and what are the keys concepts every Elasticsearch practiser should know. Let’s dive right in.

What is Elasticsearch?

Have you ever asked yourself how Wikipedia finds the right articles when you type a word or an expression in its search bar?

Have you ever asked yourself how Amazon Web Services or Nextflix knows in which region of the world to create a certain service?

Obviously, they use advanced search features and analytics tools. That’s what Elasticsearch offers: advanced search features and analytics tools.

Elasticsearch is a software (we call it an engine) used to store and perform advanced searches and analytics on stored data of all types, including textual, numerical, geospatial, structured, and unstructured.

It’s free and open source and has got a distributed architecture that makes it easy to scale. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic).

Here is the best way I found to summarize Elasticsearch: Elasticsearch is a real-time full-text search engine, a NoSQL database, and an analytics tool written in Java. It’s very scalable and has got a very comprehensive RESTFul interface to be easily consumed — Thank Thijs Feryn.

Let’s go through it asap.

1- Real-time and full-text search engine

I’ve made some queries in Elasticsearch and got the response in 0 seconds and some microseconds: That’s real-time. Elasticsearch performs advanced queries in an insane timeframe, especially with its cache features. Performing full-text search is its main feature: it can look deeply at each word even letter to give you the most relevant result.

2- NoSQL database

In Elasticsearch, we store data as documents with the flexibility to not follow a real data structure.

Some people tend to use Elasticsearch as a relational database — like PostgreSQL, MySQL — with entities and relationships. In my opinion, it’s a mistake and I’ll tell you why in one of my coming articles.

By the way, you can let me know in a comment your medium blog profile so we’ll follow each other and share our experiences with our mates.

4- Analytics tool

Elasticsearch is the core of ELK (Elasticsearch, Logstash, Kibana, and so on) stack, a set of amazing tools among which Kibana. The integration of Elasticsearch and Kibana offers you the power to visualize, process your raw data without worrying about writing one line of code. You can use them to discover relationships among your data using graphs and most importantly, extract insights from Elasticsearch. And the mind-blowing news is its machine learning features.

5- It’s got a RESTFul interface

Imagine that you can query your data from your datastore without any additional third-party API.

Elasticsearch exposes REST APIs that are used by clients and can be called directly to configure and access Elasticsearch features.

You know what Elasticseach is. Now let’s move on to its core concepts.

Key concepts of Elasticsearch you’ve got to know

Elasticsearch is not only a tool, it’s a technology with its own concepts, its own architecture, and some meanings that are mandatory to know before starting.

1- Index

The first concept we’re going to examine is index. When it comes to describing the data structure Elasticsearch is based upon, users often refer to a comparison with relational databases where an index in Elasticsearch is a database in a relational system but … actually an index is more like a table than a database in a relational context.

An index :

  • is a collection of documents that have similar characteristics. It’s the first data structure you have to create before starting storing data. In a news website context, you can have an index for blogs, another for events, another for ads, another for users. Then you can perform your searches against an index.

2- Mapping

Before getting to mapping, let’s talk about schema. A schema defines the structure of a document by describing the type of each field of a document that’s going to be stored in an index.

Here is an example of a schema for blogs index:

{
"blogs" : {
"mappings" : {
"properties" : {
},
"title" : {
"type" : "text"
},
"author" : {
"type" : "keyword"
},
"views" : {
"type" : "integer"
},
"created_at" : {
"type" : "date"
}
}
}
}
}

Telling Elasticsearch to set a schema to an index is called mapping.

3- Document

In the context of our blogs index, let’s have a look at an example of document:

{
"title" : {
"type" : "Who is the courageous guy that slapped President Macron ?"
},
"author" : {
"type" : "Anonymous"
},
"views" : {
"type" : 50000
},
"created_at" : {
"type" : "2021-06-17"
}

}

A document is a JSON object that is stored within an Elasticsearch index.

Tips: To store data, create an index firstly. Then define the mapping of your index before starting saving data inside. Once you stored data in an index it’s not possible to change its schema.

4- Elasticsearch as infrastructure

We talked about the key concepts on top of Elasticsearch. What about the underlying layer : Elasticsearch system infrastructure?

Here are the three main concepts you’ll be dealing with daily as an Elasticsearch fan 😎: cluster, node, and shard (primary shard and replica).

Let’s assume you’ve just installed Elasticsearch and you’re about to start using it.

  • When your instance of Elasticsearch starts up, it generates a node in the default Elasticsearch cluster.

Here, a cluster is a group of nodes. It orchestrates and distribute tasks, searching and indexing among its node(s). Thank of a node as a server that joins or leaves a cluster and handle HTTP and transport traffic.

  • Then you will create an index, for example, blogs index. Your index will be created in this node and will then be spread out across many shards.

A shard is a part of the main index which in reality is a separate Apache Lucene index. Shards can be either primary, containing parts of the main information of the index, or replicas, containing a copy of the information of a primary shard.

It means that all your blogs data you’ll insert will be spread on small units called shards. Why? For the sake of high availability and speed.

Note: The replica of one shard will never be kept on the same node as its corresponding primary shard and so if there only exists one node in a cluster no replicas will be generated.

Congratulations ladies and gentlemen! Your journey with Elasticsearch has just started and I’m excited to know that you started.

Question: According to you, is Elasticsearch going to be the main searching and analytics tool in the next couple of years? — Let me know your thought in a comment, please.

I’m looking forward to reading about your experience with Elasticsarch and what knowledge you’d like to share with us.

This article is part of a series. So please would you mind following me so that you’ll get notified for my next articles.

--

--