Logstash: 5 Easy Steps To Consume data From RabbitMQ To Elasticsearch.
A few days ago, I received an interesting request from one of my best followers. I’ve got to say that I’m thrilled to help out with Logstash.
So let’s get to the point. Nidhi was looking for a way to process logs from a RabbitMQ queue with Logstash and seed an Elasticsearch index with those data.
Well, welcome folks in this awesome journey. Without further ado, let’s jump right in.
What is the matter? Imagine that you have some logs that are published continuously in a RabbitMQ queue (hold on, we’ll go through what is RabbitMQ very soon). You want to process and seed an Elasticsearch index with those logs in order to analyze them with Kibana or any other BI tool. How can Logstash help you to set up that pipeline? That’s the whole purpose of this article.
To build that architecture, we’re going to set up 4 components in our system. Each one of them has got its own set of features. Here there are:
- A logs Publisher
- A RabbitMQ Server With a Queue To Publish data to and receive data from.
- A Logstash Pipeline To Process Data From The RabbitMQ Queue.
- An Elasticsearch Index To Store The Processed Logs.
1- Logs Publisher
Before digging into the logs publisher we’re going to build, let’s clarify that logs can come from any software. It can be from a web server (Apache, Nginx), a monitoring system, an operating system, a web or mobile application, and so on. The logs give information about the working history of any software.
So our logs publisher will simulate a software that publishes its logs into a queue so that other softwares or connectors can consume them for analytics purposes.
Our logs publisher is merely a NodeJs application that sends static Apache access logs into a RabbitMQ queue so that subscribers will be able to consume and process those data. Ready to dive into it? Let’s go!
- Install NodeJs on your machine.
- Set up the publisher on your machine.
GitHub - Akintola/rabbit-mq-simple-publisher-consumer
Contribute to Akintola/rabbit-mq-simple-publisher-consumer development by creating an account on GitHub.
Clone this repo and run this command from the root of the folder:
- Give the necessary permissions to the send.js file so it can be executable. For Ubuntu users, execute:
sudo chmod +x send.js
You may ask me how to run the logs publisher so it can publish the first log. Well, there is a piece we need to make our publisher work. Guess what? The queue to publish to. That’s exact.
As said from the beginning, our logs publisher will be publishing the logs to a RabbitMQ queue. But hang on a second. What’s the hell is this RabbitMQ?
RabbitMQ is the most widely deployed open source message broker. — https://www.rabbitmq.com/
Again, geeks suffer too much for over-complicated concepts (forgive me, my mind is speaking out loud 😀). Simply said, RabbitMQ is a tool that can make software-published data, available to thousands of others. Then a software S1 can send data to the RabbitMQ server and RabbitMQ will allow a bunch of other softwares to consume those data.
Instead of going through a very long RabbitMQ installation, we’re going to go with a RabbitMQ Docker instance to make things simple. So follow me along dude.
- First, install Docker
- Setup a Rabbit Server Docker Container
Are you willing to see the magic happen? Just run:
docker run -p 5672:5672 rabbitmq
And we’re all set. Our RabbitMQ is up and running on port 5672 on your machine.
We’re about to start the most exciting part of our journey. Is your belt tight?
3- Logstash With A Log Processing Queue
Here is one of the core parts of our system. I hope you’re used to working with Logstash. If not, I wrote a pretty nice article about it Logstash: Software Engineer, Extracting data Has Never Been So Easy.
If you haven’t got Logstash already set up, again don’t worry. Have a look at this article and you’re done. Logstash: The Easiest Way To Install It On Ubuntu 20.04.
Once Logstash is installed on your machine, let’s create the Pipeline to process data.
Yes, I know. I can see some of my friends yelling at me: Akintola, you need to break those lines of config down asap. I won’t delay this more chief.
There are three main parts in a Logstash pipeline: the input that describes where data are coming from essentially, the filters to transform, format, add, or delete the incoming data, and the output to indicate where the data will be pushed to.
- id: it’s the unique identifier of this RabbitMQ input, this is particularly useful when you have two or more plugins of the same type, for example, if you have 2 RabbitMQ inputs;
- port: the port on which RabbitMQ is running, by default is 5672;
- vhost: Elastic recommends to leave the default “/”;
- queue: obviously, this is the queue to which data are published;
- ack: it enables message acknowledgments so that messages fetched by Logstash but not yet sent into the Logstash pipeline will be requeued by the server if Logstash shuts down;
- grok: grok is a Logstash filter plugin used for parsing unstructured log data into something structured and queryable;
- date: it’s also a Logstash filter plugin, it is used for parsing dates from fields, and then using that date or timestamp as the Logstash timestamp for the event;
- elasticsearch: here, it refers to Logstash output plugin for Elasticsearch;
- stdout: stdout is also a Logstash output plugin. It prints in your terminal what data Logstash is being processed.
Now that you’ve gotten on it those configuration parameters, we’ll learn how to run that pipeline. On Ubuntu, execute:
sudo nano logstash-rabbitmq.conf
Then copy and paste the content of the above logstash-rabbitmq.conf file. Then hit Ctrl+X to save and close the editor.
Now that we set up our Logstash pipeline, we’re ready to run it. To do so, let’s move to the Logstash installation root directory. On Ubuntu:
Let’s run the pipeline:
sudo bin/logstash -f /etc/logstash/conf.d/logstash-rabbitmq.conf
Here is a screenshot of what you should get if your RabbitMQ Docker Instance is running well and everything works pretty well on your Logstash pipeline side:
4- Elasticsearch Index To Store The Processed Logs
This is the last — but not the least — part of our wonderful run. So what do we have to do over here? Just launch an Elasticsearch instance and make it reachable by the other components of our architecture.
Setting up a brand new Elasticsearch instance is very straightforward …. if you accept to follow me 😄: Installing Elasticsearch on Ubuntu, Mac, or Windows.
5- Now the whole magic can happen
I can see people eagerly waiting for this stage to see how it’ll work. Let’s check everything one more time. You must have:
- a RabbitMQ Docker instance running;
- a Logstash pipeline running;
- and an Elasticsearch instance running.
Now go to the logs publisher root folder and run the send.js script.
and hit Enter.
From your logs publisher terminal, here’s what you should get:
From your running Logstash pipeline:
So far so good. But remember, our goal is to get the data pushed into an Elasticsearch index. Let’s check if data are present over there.
curl -XGET "127.0.0.1:9200/logstash_rabbit_mq_hello/_search?pretty"
And tada !!!
Congratulations to you’ll folks for following me.
In this tutorial, we set up an entire architecture to consume Apache access logs from a RabbitMQ Queue and pushed the processed result to an Elasticsearch index for analytics purposes.
You should now be more familiar with how Logstash consumes data from a RabbitMQ Queue and how it works with Elasticsearch.
If you found it interesting, please share and follow me so you’ll get notified of every new interesting article I’m going to publish. Keep practicing with Logstash and you will be able to automate all the boring logs processing you now have to perform manually from a queue!
Till next time, take care! 👏