Syslog-ng: Kafka source — In-depth analysis

The syslog-ng application can read messages from the sources. It processes them with filters, rewrite rules, parsers and finally sends messages to their destinations. The syslog-ng application already has a Kafka destination that is implemented in Java. The Kafka source will allow syslog-ng to read messages from Kafka. This can be used for example as a queue between several syslog-ng instances.

Syslog-ng : Kafka source

Kafka source was my Google summer of code project for the year 2016 [ 1] with the Syslog-ng organization. Mr.Viktor Juhász mentored me throughout the past five months to keep me on the right track. It would be nearly impossible without him to complete this project 🙂

Project GitHub link: https://github.com/VIthulan/kafka-source

Syslog-ng has Java destinations, where we can write our own destinations for the log messages, such as a file, elastic search, Kafka, etc. During the first few months of my GSoC period, I learned about Java destinations with Syslog-ng. Before you dig deep into Java source, it would be better if you read my post on syslog-ng Java destinations [2]. It will help you to understand the functionality of Java source. An example configuration file and a worked example are also available here [ 3]. Same as that, Java sources can read messages from log sources such as a file, Kafka and etc. I’ve implemented a Java Kafka source for Syslog-ng in this project [ 4]. Kafka source can be used for queue Syslog-ng instances with Kafka. Kafka source feature will be released in the Syslog-ng GSoC version.

The syslog-ng project was started in 1998 and ported existing nsyslogd code to Linux. It is a log daemon that supports a wide range of inputs, such as Syslog, unstructured text, message queues, databases (SQL and NoSQL alike) and more. Before you start work into Kafka source you need to install syslog-ng in your Linux.

You can install syslog-ng by following their gitbook [ 5]. They have clearly explained all the dependencies and installation methods and FAQs that could occur. Otherwise, you can even install syslog-ng in a docker. You can follow my blog post for that [ 6].

NOTE: You need to install syslog-ng version that supports Java source. Since currently it is not included in Syslog-ng master, clone it from here [ 7]. Once you’re completed with the installation of the Syslog-ng now we can explore more into Kafka source 🙂

Functional features

  • Read messages from Kafka
  • Send those read messages to syslog-ng
  • Single topic consumption
  • Can consume multiple topics with multiple syslog-ng sources
  • Can consume messages from last read offset.

UML Class diagram

The following class diagram describes the structure of Kafka source’s classes, attributes, operations and the relationships among its objects.

  • KafkaSourceHandler is the syslog-ng connector that connects Kafka consumer and Syslog-ng.
  • KafkaConsumer gives access to KafkaMessageListner for KafkaSourceHandler. It works as a mediator.
  • KafkaMessageListner gives access to Kafka. It connects with Kafka and reads messages from the Kafka cloud.
  • KafkaProperties stores the configurations of Kafka consumer property
  • LogSource is an abstract class that unlocks the feature Java source.

Sequence diagram

The following sequence diagram shows how objects operate with one another and in what order.

After successful initialization and connection open with Kafka source, Syslog-ng checks whether is there any messages to consume (isReadable). If there are any, it consumes messages until there are no messages left (ReadMessage). If there are no messages to consume, it will again check until isReadable method to be true.

This project is completely coded in Java and I have used Gradle for automated building. I also have used High-level Kafka consumers to consume messages.

Dependencies

Kafka source has three dependencies that needed to be installed or included in your build.gradle

First and Second dependencies needed to be installed from your local directory. Those Jars are available in $syslog-ng-home/install/lib/syslog-ng/java-modules.

You can add them as following in your build.gradle file.

repositories { flatDir { dirs '/home/vithulan/gsoc/syslog-2/syslog-ng/install/lib/syslog-ng/java-modules' } }dependencies { compile name: 'syslog-ng-core' compile name: 'syslog-ng-common' }

This will compile syslog-ng-core and syslog-ng-common jars during the build time.

Third dependency Kafka can be installed from Mavencentral as following. You require an Internet connection to download this dependency.

repositories { mavenCentral() }dependencies{ compile ( [group : 'org.apache.kafka' ,name :'kafka_2.10' ,version : '0.9.0.1'] ) }

Algorithms

KafkaSourceHandler methods are extended from Logsource whose functions are similar to StructuredLogDestination. You can read my blog post [2], where I have explained the methods and their functionalities.

getStatInstance() method should return a unique name for the source. So we’ve implemented it as follows,

This will return a unique name by adding group id and topic to the name.

readMessage() will return 0 : LogSource.SUCCESS if it consumes message. If it is not connected it will return 2 : LogSource.NOT_CONNECTED. If there are no messages to consume it will return 1 : LogSource.NOTHING_TO_READ. While its returning 0, syslog-ng will keep consuming messages without checking isReadable method. If readMessage() returns 1 or 2 , syslog-ng will check isOpened method after a time period and then it will check isReadable isReadable().

Kafka consumer iterator is thread blocking. It will busy-wait until there are any new message comes to consume. Therefore syslog-ng can’t shut down without exiting from that blocking thread. It is a bug in Kafka [ 8]. The only way that we could resolve this bug in syslog-ng is to set a consumer time out where it will throw an exception while there are no messages to consume for a given consumer time out time period. This exception will cause an exit from Kafka consumer and then we can shutdown syslog-ng.

This is a method from KafkaMessageListner, which handles ConsumerTimeout Exception as an exit way from thread stall.

All error handling and debugging and logging is handled using syslog-ng’s InternalMessageSender class.

Options

Kafka source requires 7 options to be set in the configuration file.All these options are validated as follows,Time outs and intervals should be integer values. So we’re checking Integer type in the init method and also assigning default values as well.

Update

Syslog-ng Kafka source will restart for every 10 seconds (Consumer timeout default value) after waiting for messages. Then it will wait 60 seconds to reopen connection, during the time you can shutdown syslog-ng. The waiting time can be set from time-reopen() in global options.

We made this change to overcome the bug in Kafka [8].

You can install Kafka source by following my blog post about how to install and run Kafka source [9]

That post [9] explains about writing syslog-ng configuration file, building and getting jar from kafka source and also about running zookeeper host, kafka and kafka producer. In the above blogpost I’ve set the configuration file destination as a file, so all Kafka messages that are consumed will be saved in a text file.

Sample text output from Kafka source

Kafka source is capable of consuming records from Kafka and its key to make a queue between two syslog-ng instances with Kafka. I have completed almost all the functional features that I proposed in the proposal. After completing reviews from syslog-ng community this will be released in syslog-ng gsoc release.

I’d like to appreciate and thank Mr.Viktor Juhász for all the mentoring and guidance I have received, and I know I’m not done learning yet. The list will continue to grow through the years. He responds to my messages immediately and helped me to clarify all my doubts I got in the midst of day and even in the midnight. I remember I spoiled your sleep many times 🙂 Thank you Viktor for believing in me and guiding me all the way from the beginning 🙂 I also like to thank Google and Summer of code team for giving us a platform to work in an open-source environment. This has increased my interest in Open source products and my self-confidence as well. And finally, I’d like to thank The Syslog-ng community of course :). Thanks for selecting me as an intern for this summer and all the experiences I received in the past 6 months 🙂 The value of experiences I gained from this year's Google Summer of Code is immeasurable. 🙂

Cheers 😀
Vithulan MV.

Happy Coding! 🙂

[1] https://summerofcode.withgoogle.com/projects/#5449529503514624
[2] https://vithulanmv.wordpress.com/2016/06/07/the-syslog-ng-java-destinations/
[3] https://github.com/VIthulan/syslog-ng-java-sample
[4] https://github.com/VIthulan/kafka-source
[5] https://syslog-ng.gitbooks.io/getting-started/content/
[6] https://vithulanmv.wordpress.com/2016/05/10/installing-syslog-ng-from-source-using-docker-image/
[7] https://github.com/juhaszviktor/syslog-ng/tree/f/JavaSource
[8] https://issues.apache.org/jira/browse/KAFKA-1343
[9] https://vithulanmv.wordpress.com/2016/08/14/the-syslog-ng-kafka-source-in-java-an-introduction/

Originally published at http://vithulanmv.wordpress.com on August 21, 2016.

Computer Science Engineer

Computer Science Engineer