It seems there are a lot of Apache products ripe for the Big Data universe. There is Apache Kafka for messaging. Hadoop lets you do jobs where stuff is stored on disk (HDFS). Spark let's you run the jobs and store results in memory. There is also a Druid I have not heard about previously that is for real time.
In addition to the article I read, there is a Hacker News follow up discussion. There the cutting edge crew says Hadoop and Spark are already legacy. Kafka, RedShift, and a host of other technologies replace them. Man, even in the database world, things move fast.