SQL on Hadoop – A Common Tool Comparison

There are many different methods and tools for interacting and querying data within Hadoop. The most widely used tools allow for SQL based querying of the data. The following article summarises a great comparison by MapR of the most common SQL on Hadoop technologies available today.

SQL Mode Hive Drill Impala Presto Spark/Shark
Batch Interactive Interactive Interactive In-memory /streaming
SQL ANSI Completeness Hive Drill Impala Presto Spark/Shark
SELECT query Medium Medium Medium Medium Medium
DDL/DML Medium Low Low Medium
Packaged Analytic functions Low Low
UDFs/Custom functions High Low Low High
Client Access Hive Drill Impala Presto Spark/Shark
Shell Yes Yes Yes Yes Yes
JDBC Yes Yes Yes Yes Yes
ODBC Yes Yes Yes Yes
Common File Format Support Hive Drill Impala Presto Spark/Shark
Text Yes Yes Yes Yes
CSV Yes Yes Yes
Sequence Yes Yes Yes Yes
RC Yes Yes Yes Yes
ORC Yes
Parquet Yes Yes Yes
Avro Yes Yes Yes
JSON Yes Yes Yes
Compression Yes Yes Yes
Hive SerDe Yes Yes Yes
Data Sources Hive Drill Impala Presto Spark/Shark
Files Yes Yes Yes Yes Yes
HBase Yes Yes Yes Yes
Query non-Hadoop sources? Yes Yes
Data Types Hive Drill Impala Presto Spark/Shark
Relational Yes Yes Yes Yes Yes
Complex Yes Yes Yes Yes
Metadata Hive Drill Impala Presto Spark/Shark
Hive Metadata Store Yes Yes Yes Yes Yes

 

The information in the table above was summarised from MapR’s website here.