It’s Q4 of 2018 and it’s really interesting to observe the change in Big Data Landscape, especially around open-source frameworks and tools. Yes, it’s still very fragmented, but the actual solutions and architectures start to slowly converge.
Right now I’m in the beginning of a huge platform redesign at work. We always talk about various frameworks and libraries (which is actually just an implementation detail), but I started to think: what qualities should modern data pipelines have going forward? The list that I came up with is below.