Elasticsearch: Software Engineer, You’ve Got to Master Analysis.

Photo by cottonbro from Pexels

Introduction

1- What the hell is an analyzer?

  • the data stored in the index and
  • the input searched (the content the user is looking for in the index and that is contained in the request).

Text analysis is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s optimized for search. — Elasticsearch documentation

2- How does an Elasticsearch analyzer work?

Figure1: Elasticserach Analyzer Schema

Character filter

  • HTML Strip Character Filter: this filter strips out HTML elements like <b> and decodes HTML entities like &amp;.
  • Mapping Character Filter: If you want to replace any occurrences of a particular string with a specified one, the mapping character filter is your man.
  • Pattern Replace Character Filter: As its name implies, the pattern replace character filter replaces any characters matching a regular expression with the specified replacement.

Tokenizer

Token filter

  • Synonym token filter: The synonym token filter allows to easily handle synonyms during the analysis process. All you've got to do is to provide a configuration file containing the synonyms.
  • Trim token filter: Does it ring a bell for you? I’m seeing you nodding your head. As you imagine, it removes leading and trailing whitespace from each token in a stream. The snag here is the trim filter does not change a token’s offsets even if it can change the length of a token.
  • Conditional token filter: This token filter is just great. It allows you to applies a set of token filters to tokens that match conditions in a provided predicate script.

Bonus

  • Standard Tokenizer
  • Lower Case Token Filter
  • Stop Token Filter (disabled by default)
curl -XGET -H "Content-Type: application/json" "127.0.0.1:9200/_analyze"
{
"analyzer" : "standard",
"text" : "Google, Can You Give Me The Phone Number Of the Girl I Met Yesterday ?"
}

Conclusion

  • Analysis happens at two levels of a searching process: the indexing moment and the searching one.
  • Analyzers can contain three types of components and only one is mandatory: character filter(s), tokenizer (compulsory), token filter(s).
  • Elasticsearch gives you the possibility to build your own analyzers.

--

--

--

Certified AWS Solution Architect, Fullstack Software Engineer & DevOps. I like Solving Challenging Software Engineering Problems & Building Amazing Solutions.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Connectivity as a Commodity?

資料庫筆記

what happens when you type ls -l *.c

Create Serverless REST API on iPad like a madman

Flutter: A Portable UI Framework For Mobile, Web, Embedded, And Desktop

The Vault Hill Development Update — May, 2022

Vault Hill update

Top reasons to build an app based on microservices in Node.js

Coordinating Fetch Requests with Your Rails Controllers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akintola L. F. ADJIBAO

Akintola L. F. ADJIBAO

Certified AWS Solution Architect, Fullstack Software Engineer & DevOps. I like Solving Challenging Software Engineering Problems & Building Amazing Solutions.

More from Medium

REST API — the modern way of web-based architecture and design

Intelligent Pipelines for Software Security

NoSQL, MongoDB, HiveQL

Did you develop an API using best practices?