Big data specialist Actian has expanded its line-up with Actian Analytics Platform-Hadoop SQL Edition, a tool that enables organisations to use familiar SQL queries inside Hadoop to deliver high-performance analytics capabilities for business intelligence (BI) applications.
Available from 30 June, the new product is claimed by Actian as the first end-to-end analytics platform to run entirely natively in Hadoop. This pits the firm against Cloudera’s Impala, released last year, but Actian claims that its VectorWise database engine technology is better suited to the distributed processing model of Hadoop and can deliver results up to 30 times faster.
According to Actian, Hadoop has become a standard framework for the distributed processing of very large data sets, but it is still not easy to work with or anything like as mature a technology as conventional database tools.
“Hadoop enjoys the reputation of being a ‘data lake’ where you do pre-processing, but it’s not heretofore been a place where you can do world-class high functioning and extreme low latency SQL operations, which is a shame because there are millions of SQL-savvy business users out there who would love to bring their BI tools to bear on the problem,” Actian chief technology officer Mike Hoskins told V3.
Actian is enabling this by combining its high performance database engine – also referred to as X100 by the firm – into every node in a Hadoop deployment. It does this in such a way that companies can run standard SQL queries and analytics functions against large datasets stored in the Hadoop Distributed File System (HDFS).
According to Hoskins, the difficulty in making this work has not been so much about understanding how to distribute queries as in fitting their database engine with HDFS.
“HDFS is a challenge for databases because it is kind of a block store with a default block size of 64MB and is a little bit crude for low-latency database engines,” he said.
“A lot of the work ended up not being so much in distributed computation and optimisation and SQL planning, which we already know very well, it came in co-locating the X100 engine technology, so we had to work with Hadoop’s management framework a little bit, so we’re YARN-enabled in our product so we can issue the X100 engine out to the data on the nodes as efficiently as we can,” he added.
Another key part of Actian’s Analytics Platform is its parallel loading technology. This enables customers to upload data for ingest in the first place, according to Hoskins.
“Our rivals suffer from the pinhole problem, which is that they just assume that the data is born in their database, and it’s not. Having optimised high-speed parallel loaders already available in the product today really makes a difference,” he said.
Actian’s aim is that the Analytics Platform-Hadoop SQL Edition will open up Hadoop to the army of SQL professionals already employed in businesses worldwide.
“These people who have been shut out of the Hadoop ecosystem, wouldn’t it be nice if the millions of SQL users could get into the game? Whether it is a power user who wants to do incredibly complex SQL or whether it is just someone winging his Cognos and his Tableu and QlikView, these guys want to get in the game, because that’s where the data lives,” Hoskins said.
Actian Analytics Platform-Hadoop SQL Edition will be available first for on-premise deployments, but will be offered in the near future as a cloud-hosted service, Actian said.
3 June 2014 | 12:00 pm – Source: v3.co.uk