As of the QRC 02/2021 release of SAP HANA Cloud, data lake, you can query files in your data lake file container that contain structured data without ever having to load the data into a database.
What is SQL on Files?
SQL on Files is a capability of the Data Lake Files service in SAP HANA Cloud, data lake that allows you to query files that contain structured data that are sitting in your data lake file container.
As of the time of this writing, SQL on Files supports the following structured file formats:
- Optimized Row Columnar (ORC)
- Comma-Separated Values (CSV)
- Apache Parquet
SQL on Files is considered a bridge between the Data Lake IQ and Data Lake Files components of SAP HANA Cloud, data lake.
When and why would I use this feature?
Use SQL on Files to lower the cost of analyzing large amounts of data of unknown value that is sitting in files.
SQL on Files allows you to perform some pre-exploration and data filtering on the data before moving aggregations of it, or all of it, into a database such as Data Lake, IQ, NSE disk storage, or SAP HANA Cloud, HANA database. You can even create views on the data.
You can also use SQL on Files in cases where you just want to keep the data in files so that other tools such as Apache Spark can access it.
How do I access this feature?
While SQL on Files is enabled by default in your SAP HANA Cloud, data lake instance, there are a few things you need to do before you can start using it.
- Follow the steps in this video, Data Lake Files and SQL on Files, to see how to configure your file container, set up authentication, and create a SQL on Files user in your SAP HANA Cloud, data lake instance.
- Add files to your file container using the steps found here: Adding Files to a File Container.
- Visit this topic to find the workflow you follow depending on your SAP HANA Cloud, data lake configuration (stand-alone vs. HANA DB-managed): Use SQL on Files.
- Query data in the files you added to the container using the steps found here: Queries Using SQL on Files.
Where can I find more information?
- Docs: SAP HANA Cloud, Data Lake Administration Guide for Data Lake IQ (SQL on Files)
- Docs: SAP HANA Cloud, Data Lake Administration for Data Lake Files
- Docs: SQL HANA Cloud, Data Lake Files REST API
- Docs: Manage a Data Lake File Container (SAP HANA Cloud Central)
- Docs: Using the File Container
- Docs: Adding Files to a File Container
- Docs: What is SAP HANA Cloud, Data Lake
- Blog: Setting Up Initial Access to HANA Cloud data lake Files
- YouTube: Data Lake Files and SQL on Files
- OpenSAP: Tools to access and use the data lake in SAP HANA Cloud
- SAP.com: SAP HANA Cloud, Data Lake
- Website: Apache Spark
~ Happy squealing on files! ~