secfree - Maybe I should Give up Using Gobblin

Maybe I should Give up Using Gobblin

Gobblin is a great tool for ETL. It has good abstract concepts. It helps me a lot in the past two years.

But, the following reasons made me want to give up.

Meet some exceptions which are difficult to fix
1. OutOfMemory
2. Task process got hung
Difficult to run Gobblin in cluster mode
1. MapReduce mode is hard to use
2. YARN mode needs Helix, which is not as common as HDFS and YARN
The components have good abstract concepts. But it’s not easy to do some change for some basic classes
A lot of accumulated questions

They made my job delayed several times. In such cases, I turned to Spark, which solve the problems elegantly. The operators of Spark are at a lower level compared to Gobblin’s components, but they are high enough and flexible. With the combination of workflow schedule tools, it’s able to schedule a lot of Spark applications across the cluster. Most important, Spark is robust and has no risk on feasibility.

Maybe I should Give up Using Gobblin by secfree was published on 2018-07-01