What I want is just an RDBMS which supports scale, or a Big Data database which supports SQL operation. So my comparison will be based on this view.
I have experience on Phoenix, but not on Impala. So my comparison of these two components may be completely wrong, and if you find some, please let me know.
-
Abstraction
Phoenix is a layer over HBase, using it is just like using an RDBMS, You even do not need to know about HDFS, Hive, HBase.
Impala over HBase is a combination of Hive, HBase and Impala. Before you start, you must get some understanding of these.
They both support
JDBC
and fast read/write. -
Ease of use
As described above, when you using Impala over HBase, you have to do a combination with Hive and HBase. You should create tables in Hive and HBase separately, and then map the columns of each other. You should be careful about some details when you define the table schema. Actually, it used the technology HBase via Hive.
-
Deployment
Deploying Phoenix is very easy, you just need to copy some libs to HBase’s regionserver, add a few items a configuration and restart the HBase.
If you want to deploy Impala, you must use the
cdh
branch, but nothdp
or the original Apache’s release, otherwise they will be not compatible. As I know,cdh
is not as flexible as DIY. -
Document
I think Impala’s document is not nice to read and not easy to understand. At this point, Phoenix is of course better.
So, if I just want a simple BigData-RDBMS, I would like to choice Phoenix.
Refer: