FAQs
Qubiqe (a.k.a Qubical query engine) - is a columnar execution engine that executes queries using columnar structures and processing approaches.
While Qubiqe core hypothetically can be plugged into any big-data technology, Apache Spark is the only technology it is fully integrated with.
We all know that parquet is a popular common data storage format due to its low storage size and its efficiency in data loading. A columnar approach for query processing has a lot of advantages such as lower memory footprint, better primitive type handling, faster serialization, etc. In addition, it opens up new ways to process the data. Qubiqe technology fully capitalizes on the columnar nature of the data to achieve optimal data processing efficiency.
In distributed TPC-DS of 1TB parquet with EMR on EBS storage, the median for the speedup is >2x while and max is around 10x. The speedup is found to be accelerating with data size.
Since the technology is quite new, there is still a lot of room for improvements.
It is very easy. You can either build your cluster with the Qubical query engine or you can use it with specific Spark Job. This involves tweaking few spark parameters when you run the job to make Qubiqe jar enabled.
Qubical query engine supports parquet natively but would work with any other sources including the non-columnar data source (csv, json). That said, there is some cost on transforming non-columnar data into columnar representation. So your performance gain on a specific query depends on whether the query processing is computationally heavier than the non-columnar to columnar transformation. Alternatively, support for the non-columnar store could be switched off so that Qubical query engine and vanilla Apache Spark engine would be picked automatically for each query depending on the data source.
Qubical query engine passed all and Apache Spark unit tests. It also completes all TPC-DS queries with verified results (against Apache Spark's result) so the functionality should be stable enough for production.
Currently, Qubiqe doesn't support few things: 1. Window function
2. Rollup
3. UDF But when the query engine detects that it cannot support a query, it would automatically fallback to regular spark query engine so there's no loss in terms of functionality.
Hypothetically yes, as SparkSQL and Dataframe both use the same mechanism for query planning. We turn this off currently because we are in the process of validating tests with data frame so the support would come soon.
Qubiqe does not support RDD and ML. RDD api by nature works with arbitrary structures so there is no way to transform any structure to be columnar. We haven't applied the approach to ML yet.
Assuming that ETL/queries are the slowest parts of data architecture, the application on the queries would be more impactful for the efficiency of data infrastructure.
No Qubiqe is not open-source.
Please contact sales@qubical.io for our pricing.