A comparison of Storm with Spark Streaming

Compare


Feature Spark Streaming Storm
Latency Few seconds Sub-second
Data guarantee Exactly once At least once 1
Process model Batch One
Fail recovery price Low 2 High 3
Resource manager integration YARN, Mesos YARN, Mesos
Consistency break condition Output operation failure Replay an event
Popular 4 More 5 Less
Development cost Less 6 More
Batch framework integration Spark N/A
Message Passing Layer Netty, Akka Netty or ZeroMQ
Implement Language Scala Clojure
Hadoop distribution support Hortonworks, Cloudera, MapR Hortonworks, MapR
Company support Databricks N/A
Origin Uc Berkeley BackType, Twitter
Production use 2013 2011


  1. Actually, Storm’s Trident library also provides exactly once processing. But, it relies on transactions to update state, which is slower and often has to be implemented by the user.
  2. Because of the dependency chain of Spark RDD, it’s easy to recovery from failure by relaying it from the source, need not to track every middle state.
  3. Each individual record has to be tracked as it moves through the system
  4. Judged by code commit velociy and issue velocity.
  5. Spark also has a better ecosystem

    Spark ecosystem stack

  6. With Spark, the same code base can be used for batch processing and stream processing.

Refer


  1. Apache storm vs. Spark Streaming
  2. Storm vs. Spark Streaming: Side-by-side comparison
  3. Apache Storm vs Spark Streaming
  4. Apache Storm vs. Apache Spark