Quite unexpectedly, this morning, I received a mail from O’reilly where I was proposed to download their latest free e-book about streaming data. This was a great occasion to read it (less than 30 pages), and give you a short review of it.
First of all, here is the link where you can download the e-book.
This book is a global presentation of what streaming is, and what challenges it raises for the business and the projects. Thus, it is quite accessible and easy to read. The authors did not go into the technical parts, and this would have been hard for them to do so in less than 30 pages…
Unfortunately, I have to say that I didn’t really like this book. This is my very personal opinion, I keep thinking that it deserves being read, but from my point of view, I expected more streaming content from this “streaming data” book.
Regarding the content of the book, there’s something I don’t get. If you are to write a book, about streaming data, why don’t you focus your content on streaming data ?? After reading the book, I really felt like the authors were beating around the bush. Was there so little to write about data streaming and processing ??
The authors have written a lot of chapters about what lies around the data streaming ecosystem, but they are very careful never to directly address the subject. I wasn’t expecting to be explained what problems MapReduce paradigm can solve, why containers and orchestration have changed the way applications are deployed. I wasn’t expecting either to be told the history of SQL and NoSQL and the emergence of JSON and XML documents.
I must be naïve, I was just expecting to be told the paradigms of data streaming processing, and all the challenges this raises in companies that know nothing but batch processing.
As expected, we find a chapter dedicated to Apache Spark. Great! What I don’t understand though is that the many streaming frameworks referenced in the book do clearly not share the same popularity. Therefore, whereas a whole chapter of the book is dedicated to Apache Spark, other frameworks such as Apache Flink, and Apache Kafka just deserved a quote.
I think about all the tremendous work Flink engineers have achieved, and how Flink can do so much more than Spark Streaming, I think about the beautiful platform dedicated to Streaming, Confluent has created around Kafka, and how it is roughly compared with RabbitMQ… RabbitMQ !!!
I’ve read this book, and after second thought, I must say the authors were right : Data Streaming adoption IS a real challenge for companies. The paradigms it brings are new, they are definitely not trivial, technology is evolving very rapidly, and we must do our best to educate and make all of this more friendly.
I just didn’t realize how much there was still to do…