Amazon Web Services (AWS), announced PartiQL on Thursday. The new query language is said to be compatible with all data types, structures, and storage situations.
“Each type and flavor of data storage may be suitable for a specific use case, but each comes with its own query languages. The result is a tight coupling between query language and data format. … This is a major obstacle to the agility, flexibility, and effectiveness required to effectively use data lakes,” the company stated in explaining why it created PartiQL.
It continued, “As long your query engine supports PartiQL you can process structured data from relational database (both transactional as well as analytical), semi-structured or nested data (such as an Amazon S3 Data lake), and even schemaless data in NoSQL and document databases that allow different attributes to different rows.” “We are open sourcing PartiQL’s tutorial, specification, as well as a reference implementation under the Apache2.0 License. This allows everyone to contribute and help drive widespread adoption of this unifying query language.
AWS stated that PartiQL is being used internally by the company’s S3 Select, Glacier Select and RedShift Spectrum products. According to the company, Couchbase has also signed up to support the query language.
Data lakes are large storage repositories that are often used by enterprises. They store data in its “raw”, or “natural” format in a flat structure. This is in contrast to data warehouses which are usually very hierarchical and store data in folders or files. Each item is tagged with a unique identifier and/or metadata. Data can then be pulled for a variety of purposes, including data-mining, machine learning, and analytics.
Due to the large amount of data coming in from many sources, enterprises are increasingly attracted to the centralized, but unstructured structure of a data pool. Data lake projects can fail for many reasons.
AWS explained in PartiQL’s blog post that it created the solution due to its need to query/transform many data types, including tabular, semi-structured and nested — stored in various formats and storage devices.
It stated that it had created a language that was strict SQL compatible, allowed nested and semi-structured data processing with minimal extensions, treated nested data like a first-class citizen and allowed optional schema. PartiQL…provides an easy and consistent way to query data across many formats and services. This allows you to freely move your data between data sources without changing your queries. It is compatible with SQL backwards and allows for extensions for multi-valued, nestable, and schemaless data. These can be seamlessly integrated with standard SQL join, filtering and aggregation capabilities.
You can find more information about PartiQL here. You can find a tutorial here.