Tuesday, 30 June 2015

Apache Phoenix, SQL is getting closer to Big Data

Here is a post about another project in the Big Data world, like Apache Hive from my previous post, enables you to do SQL on Big Data. It is called Apache Phoenix.

Phoenix is a bit different, a bit closer to my heart too, as I read the documentation on Apache Phoenix, the word 'algebra' and 'relational algebra' came across few times, and that mean only one thing, SQL! The use of the word algebra in the docs did give me a lot of confidence. SQL has closure, is based on a database systems model which has it's roots in logic and maths and especially a subset of algebra, The Set Theory.

Apache Phoenix is developed in Salesforce and is now one of the popular projects in Apache. Apache Phoenix is a SQL skin on top of HBase, the columnar (NoSQL) database of the Hadoop ecosystem, capable of storing very large tables and data and query them via 'scans'. HBase is part of the Hadoop ecosystem and the file system it uses is usually HDFS. Apache Phoenix is using JDBC on the client as a driver.

In the race to bring the easiest to use tools for Big Data, I think Apache Phoenix is very close. It is the SQL we know used since the 1970s. The Apache Phoenix team seems to be committed and willing to introduce all of the missing parts of SQL, including transaction processing with different isolation levels.  Making Phoenix a fully operational Relational Database layer on HBase. Have a look in their roadmap. The amount of current and suggested future SQL compatibility is remarkable, and this makes me take them really seriously.
  • Transactions
  • Cost-based Query Optimization! (Wow)
  • Joins
  • OLAP
  • Subqueries
  • Striving for full SQL-92 compliance
In addition to all this, it is also possible to turn an existing HBase table to an Apache Phoenix table using CREATE TABLE or even CREATE VIEW, the DDL statements that we know. How handy is that? Suddenly you can SQL enable your existing HBase database!

How to install and use Phoenix

The SQL skin can be installed to an existing Hadoop HBase installation very quickly. All you need to do is to download and extract the tarball. You can setup a standalone Hadoop environment, look at my previous blog post for that, and then install HBase and install Apache Phoenix

Once the Apache  Phoenix software is installed, then you can start it and query it with SQL like this.

From within the bin/ directory of Phoenix install directory run

$ ./sqlline.py  localhost

That will bring you to the phoenix prompt

0: jdbc:phoenix:localhost> select * from mytable;

No comments: