Kafka Elasticsearch Sink Connector and the Power of Single Message Transformations

| Comments

I’ve been using Kafka Connect for a few years now, but I’ve never paid much attention to Single Message Transformations (SMTs), until recently. SMTs are simple transforms that are applied to individual messages before they’re delivered to a sink connector. They can drop a field, rename a field, add a timestamp, etc.

I always thought that any kind of transformation should be done in a processing layer (for example, Kafka Streams) before hitting the integration layer (Kafka Connect). However, my recent experience with configuring an Elasticsearch Sink connector proved me wrong! Complex transformations should definitely be handled outside of Connect, but SMTs can be quite handy for simple enrichment and routing!

Kafka Streams Application Patterns

| Comments

Kafka Streams is an advanced stream-processing library with high-level, intuitive DSL and a great set of features including exactly-once delivery, reliable stateful event-time processing, and more.

Naturally, after completing a few basic tutorials and examples, a question arises: how should I structure an application for a real, production use-case? The answer could be very different depending on your problem, however, I feel like there are a few very useful patterns that can be used for pretty much any application.

Deploying Kafka Connect Connectors

| Comments

Kafka Connect is a modern open-source Enterprise Integration Framework that leverages Apache Kafka ecosystem. With Connect you get access to dozens of connectors that can send data between Kafka and various data stores (like S3, JDBC, Elasticsearch, etc.).

Kafka Connect provides REST API to manage connectors. The REST API supports various operations like describing, adding, modifying, pausing, resuming, and deleting connectors.

Using REST API for managing connectors might become a tedious task, especially when you have to deal with dozens of different connectors. Although it’s possible to use some web UI tools like lensesio/kafka-connect-ui, it makes sense to follow basic deployment principles: config management, version control, CI/CD, etc. In other words, it’s perfectly fine to start with manual, ad-hoc REST API calls, but ultimately any large Kafka Connect cluster needs some kind of automation for deploying connectors.

I want to describe the approach that my team uses to make Connect management simple and reliable.

“Kafka: The Definitive Guide”: Notes

| Comments

For the last two years I’ve been working with Apache Kafka a lot. Everything including building infrastructure (and running clusters on bare metal, in VMs and containers), improving monitoring and alerting, developing consumers, producers and stream processors, tuning, maintenance, etc., so I consider myself a very proficient user.

Still, all these years I didn’t have a chance to read the ultimate “Kafka: The Definitive Guide” book. Finally, I’ve got one at Strata NYC earlier this year and finished it about a month ago. Surprisingly, while reading it, I left a lot of bookmarks and notes for myself that might be useful for beginners as well as experienced users. Obviously, they’re very subjective and specific.

Next-generation Data Pipelines

| Comments

It’s Q4 of 2018 and it’s really interesting to observe the change in Big Data Landscape, especially around open-source frameworks and tools. Yes, it’s still very fragmented, but the actual solutions and architectures start to slowly converge.

Right now I’m in the beginning of a huge platform redesign at work. We always talk about various frameworks and libraries (which is actually just an implementation detail), but I started to think: what qualities should modern data pipelines have going forward? The list that I came up with is below.

API Flexibility ∝ 1 / Data Definition Strictness

| Comments

An interesting observation based on the recent conversations at work: the more strict the data format used in the API definition, the harder it is to change the API behaviour later. And vice versa, it’s easier to change the APIs that use flexible data formats.

The Level Below

| Comments

Over the years I realized a very simple, but also a fundamental principle about being a better software engineer – understanding what’s happening one level below. With level I mean any level of abstraction you operate, for example HTTP API for a Front-end engineer, JVM and its internals for an enterprise Java developer, etc.

It sounds like an obvious suggestion, but it’s actually very useful to apply this principle to any new feature you’re going to work on.