Kafka Streams Application Patterns

| Comments

Kafka Streams is an advanced stream-processing library with high-level, intuitive DSL and a great set of features including exactly-once delivery, reliable stateful event-time processing, and more.

Naturally, after completing a few basic tutorials and examples, a question arises: how should I structure an application for a real, production use-case? The answer could be very different depending on your problem, however, I feel like there are a few very useful patterns that can be used for pretty much any application.

Deploying Kafka Connect Connectors

| Comments

Kafka Connect is a modern open-source Enterprise Integration Framework that leverages Apache Kafka ecosystem. With Connect you get access to dozens of connectors that can send data between Kafka and various data stores (like S3, JDBC, Elasticsearch, etc.).

Kafka Connect provides REST API to manage connectors. The REST API supports various operations like describing, adding, modifying, pausing, resuming, and deleting connectors.

Using REST API for managing connectors might become a tedious task, especially when you have to deal with dozens of different connectors. Although it’s possible to use some web UI tools like lensesio/kafka-connect-ui, it makes sense to follow basic deployment principles: config management, version control, CI/CD, etc. In other words, it’s perfectly fine to start with manual, ad-hoc REST API calls, but ultimately any large Kafka Connect cluster needs some kind of automation for deploying connectors.

I want to describe the approach that my team uses to make Connect management simple and reliable.

“Kafka: The Definitive Guide”: Notes

| Comments

For the last two years I’ve been working with Apache Kafka a lot. Everything including building infrastructure (and running clusters on bare metal, in VMs and containers), improving monitoring and alerting, developing consumers, producers and stream processors, tuning, maintenance, etc., so I consider myself a very proficient user.

Still, all these years I didn’t have a chance to read the ultimate “Kafka: The Definitive Guide” book. Finally, I’ve got one at Strata NYC earlier this year and finished it about a month ago. Surprisingly, while reading it, I left a lot of bookmarks and notes for myself that might be useful for beginners as well as experienced users. Obviously, they’re very subjective and specific.

Next-generation Data Pipelines

| Comments

It’s Q4 of 2018 and it’s really interesting to observe the change in Big Data Landscape, especially around open-source frameworks and tools. Yes, it’s still very fragmented, but the actual solutions and architectures start to slowly converge.

Right now I’m in the beginning of a huge platform redesign at work. We always talk about various frameworks and libraries (which is actually just an implementation detail), but I started to think: what qualities should modern data pipelines have going forward? The list that I came up with is below.

API Flexibility ∝ 1 / Data Definition Strictness

| Comments

An interesting observation based on the recent conversations at work: the more strict the data format used in the API definition, the harder it is to change the API behaviour later. And vice versa, it’s easier to change the APIs that use flexible data formats.

The Level Below

| Comments

Over the years I realized a very simple, but also a fundamental principle about being a better software engineer – understanding what’s happening one level below. With level I mean any level of abstraction you operate, for example HTTP API for a Front-end engineer, JVM and its internals for an enterprise Java developer, etc.

It sounds like an obvious suggestion, but it’s actually very useful to apply this principle to any new feature you’re going to work on.

QCon London 2018

| Comments

Last week I had a chance to attend and speak at my first QCon conference: QCon London 2018. QCon has been an example of an extraordinary tech conference for me – great organization, amazing lineup, great tracks covering bleeding-edge tech, as well as various best practices around distributed systems. Also, they managed to get a decent vegetarian food right 😉

Keep reading if you’re curious about the best talks I attended and my speaking experience.

Message Enrichment With Kafka Streams

| Comments

I’ve been working with Kafka Streams for a few months and I love it! Here’s the great intro if you’re not familiar with the framework. In the sections below I assume that you understand the basic concepts like KStream, KTable, joins and windowing.

Message enrichment is a standard stream processing task and I want to show different options Kafka Streams provides to implement it properly.