Elasticsearch: 5 min to Master Sorting from Scratch as Software Engineer.
Are you stuck trying to sort your Elasticsearch search results? Or are you looking for a way to make it easier to generate a report from your Elasticsearch data that require sorting?
Welcome to this Elasticsearch sorting lesson.
Sorting has almost always been applied to extracted data. In fact, Elasticsearch provides some powerful sorting features especially for date fields, number fields, year fields, etc, that you will need on a daily basis for your job as a data engineer, database administrator, or even software engineer.
In this tutorial, we’ll analyze together how to perform straightforward sorting on your requested data.
- Having Elasticsearch installed. If you do not, check this: Installing Elasticsearch on Ubuntu, Mac, or Windows
- curl access from your terminal.
Let’s move forward folks!
Let’s assume that your analytics department asks you for some data from Elasticsearch about customers. The department wants the results to be sorted per insertion date.
Our task is to indicate to Elasticsearch how (on which field) the data have to be sorted.
For the sake of this lesson, we are going to create a mangas index and insert some mangas by running the command below into the terminal.
List of sortable types of field
As any Database Management System, Elasticsearch offers a wide range of field data types to its users in order to allow them to apply some transformations or analytics processing to their data. Here are some common field data types you’ll be dealing with working on Elasticsearch:
- Numeric values: “double”, “long”, “date”, “date_nanos”
- Array of numeric values.
To have more insight about Elasticsearch field data types, I want you to take a look at one of my articles about mapping that stars them: Software Engineer, Before Inserting One Iota of Data in an Elasticsearch Index, You Must Do This.
Now let’s get started.
Execute the command below to sort the list of mangas per year.
curl -XGET -H "Content-Type: application/json" "127.0.0.1:9200/mangas/_search?sort=year&pretty"
Pretty cool. I hope you got the point!
Now let’s deepen some Elasticsearch sorting features.
Sorting order — ascending and descending
Elasticsearch has got two types of sort order:
asc:sort in ascending order, arrange them from smallest to largest and
desc:sort in descending order, arrange values from largest to smallest.
⚠️ Per default, Elasticsearch sort results in ascending mode based on the given field(s).
Let’s sort our manga records based on their publication year but this time in descending mode.
curl -XGET -H "Content-Type: application/json" "127.0.0.1:9200/mangas/_search?sort=year:desc&pretty"
Look carefully at the results, you’d notice they came from the latest published mangas to the earlier ones.
Sorting with body query
Till now we’ve been making our sortings through URL queries. Now we’re going to see how to make it with a body query.
Let get into it.
Yes, I do agree that this needs a pause to break it down:
- curl is a tool — more specifically, a command-line tool — used here to interact with Elasticsearch API,
- 127.0.0.1:9200 is a combination of the IP address (127.0.0.1) and the port (9200) we use to interact with Elasticsearch API,
- mangas is the name of our index,
- /_search this is our Elasticsearch API endpoint we can make a request against,
- pretty this is just for formatting purposes so that our results will come in a fancy way,
- sort this is the attribute that indicates to Elasticsearch what fields it has to sort and how it should do it (asc or desc).
After running this command, you should get the same results as what we got from our descending URL sort search.
⛔ One more thing! Text fields sorting
If you try to sort our manga records based on the “title” field, you’ll come across an Elasticsearch exception. WHY?
A text field that is analyzed for full-text search can’t be used to sort documents because Elasticsearch has considered him as a set of individual terms with an inverted index.
✨ But if definitely possible to sort our “title” field. HOW?
We have to re-index our mangas data. Let’s go through it without further ado.
Here we’re going to create a new brand index and seed it with our existing data from the former index. But now, you may ask me What’s the difference?
Before seeding our new index, we’ll define explicitly its mapping.
Want to know more about mapping, check Software Engineer, Before Inserting One Iota of Data in an Elasticsearch Index, You Must Do This.
Well, what did we just do? We created a new index and define its mapping before seeding it. If you look carefully at the request, you may notice that we added a raw attribute to title field. And instead of setting “title.raw” field as text, we defined it as a keyword data field. Then Elasticsearch will be able to sort it since “title.raw” is not indexed and analyzed by Elasticsearch but considered as just a keyword.
Let’s run the query again:
curl -XGET -H "Content-Type: application/json" "127.0.0.1:9200/mangas/_search?sort=title.raw:desc&pretty"
Try it out and yes it works 😊. Amazing dude!
Now let us sing 🎵 ! What a wonderful work — Neil Amstrong.