Logstash: Software Engineer, Extracting data Has Never Been So Easy.
I was conversing with one of my software engineer friends who wanted to extract periodically data from an XML file generated by another system to seed a centralized database for the sake of analysis.
Guess what! His ideal assumption was to develop a Laravel project from scratch implementing an ETL (Extract Transform and Load — a well-known pattern) to get the job done.
I wonder why so many software engineers like reinventing the wheel sometimes. But by the end of the day, it’s a matter of passion for us (I am one of them).
The quickest solution I found for his painful extraction concern was nothing other than … Logstash (and ELK stack for his whole concern).
In this article, I’ll walk you through the possibilities offered by Logstash and make you realize all the wonderful stuff you can do with it.
But before let’s answer the question I’m sure you’re asking yourself:
What is Logstash?
Logstash is a free tool offered by Elastic, that extracts data from various sources (XML/TXT/JSON file, databases : Mysql, PostgreSQL, MongoDB, Internet , Twitter — you get me right, syslog files and so on), transform, modify, format them before sending them into a given destination (Elasticsearch, MongoDB …). It is light-weight, open-source and easy to set up. Trust me !
1- What can you do with Logstash?
Well, the truth is that the possibilities of Logstash are almost unlimited when it comes to talking about extracting data and use them. Let me give you some of my favorites usages.
- Softwares connector, from a software A log to a software B for processing.
- Call detail record data extraction for the telecommunications industry.
- Systems monitoring.
- Analytics system setup.
- Databases synchronization system.
- IoT data gathering for analysis.
2- How does it work?
Logstash acts as a black box that receives some Input and provides Output after applying desired transformations called filters to the received Input.
For every data processing, you need to describe it to Logstash as a Pipeline. Here is an example of a pipeline:
Input can come from CSV files, XML files, JSON files, TXT files, MySQL, PostgreSQL, Beats, RESTful API, System log files, Redis, Amazon S3, Amazon SQS, Twitter …
Output can be provided for: Elasticsearch, MongoDB, CSV files, RESTful API, Nagios, Redis, Kafka, Google Cloud Storage, Amazon S3, Email …
Some interesting filters (transformations):
3- How to get started with Logstash?
Like I said it’s straightforward to start with Logstash. Here are the steps I proposed to you to start right now extracting data with Logstash.
- Install Elasticsearch
- Install Logstash
- Configure your first extraction pipeline like the example above and execute it with Logstash. And instead of executing it manually, you could automate the execution of this pipeline by Logstash. In a future article, I’ll walk you through all the steps to create an advanced and production-ready pipeline for your needs.
- And voilà! Start playing and getting insight from your extracted data.
What is the purpose of plugins for Logstash?
Remember, Logstash is an open-source solution. To improve its flexibility, the community allows anybody to add desired specificities to any stage of the Logstash pipeline (Input, Filters, Output, and Codecs).
There are thousands and thousands of plugins to allow you to customize, enhance, and set up your extraction pipeline with easiness 🧘.
Go to RubyGems.org to browse, find and install them. There are available in self-contained packages called gems.
5- If you’re a nerd, create your own plugins 🤓
Yes…. You know. Software engineers like writing additional code or sometimes entire modules to customize or add features to software. You can develop your own input, filter, codecs, or output plugin and use it on your pipeline. It’s just amazing.
For that you have to know a little bit Ruby development: a new challenge for you is coming!
Enjoy your adventure! Till next time, take care.