New World for the Data Engineer

I read an article on how to become a data engineer. It is interesting to see all the tools you would need to know. Obviously you need to know SQL. But you should also know Python? Since everything is in the cloud these days, you should know a vendor or two (AWS or Azure). Of course there is the whole NoSQL movement along with obligatory Big Data.

It seems there are a lot of Apache products ripe for the Big Data universe. There is Apache Kafka for messaging. Hadoop lets you do jobs where stuff is stored on disk (HDFS). Spark let's you run the jobs and store results in memory. There is also a Druid I have not heard about previously that is for real time.

In addition to the article I read, there is a Hacker News follow up discussion. There the cutting edge crew says Hadoop and Spark are already legacy. Kafka, RedShift, and a host of other technologies replace them. Man, even in the database world, things move fast.