Over the last 10 years, there’s been an explosion in the volume of both structured and unstructured data sets. Every year, more and more data is being generated, stored, analyzed, and mined for business insights. This growth has driven the adoption of alternative approaches to managing business data. Amazon Simple Storage Service (S3) has become the industry standard for the management of large volumes of data. In this post, I’ll share how to create an external Oracle table using the Oracle DBMS_CLOUD package, and then run a query against a comma-separated value (CSV) file located in a Pure Storage® FlashBlade® S3 Object Storage bucket.
For the purpose of this post, I’ve pre-created a FlashBlade Object Store bucket, uploaded a CSV file, set up the Oracle DBMS_CLOUD package, and configured the S3 credentials within my Oracle 21c database.
List Objects
Let’s start by confirming the size and number of rows within our CSV file called airports.csv.
Below, we can see that the file has 72,911 rows and is 4,965,317 bytes (4.7M).
1 2 3 4 5 6 7 8 |
% ls –l airports.csv –rw–r—r—@ 1 rekins staff 4965317 3 Aug 15:44 airports.csv % wc –l airports.csv 72911 airports.csv |
We can confirm the object is accessible with the aws s3api utility and list-objects option, for example.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$ aws s3api list–objects —bucket ora–bucket —query ‘Contents[].{Key: Key, Size: Size}’ —endpoint–url https://s3-fbstaines02.uklab.purestorage.com –profile fbstaines02 –ca-bundle /home/oracle/dbc/commonstore/wallets/ssl/pureCA.pem [ { “Key”: “airports.csv”, “Size”: 4965317 } ] |
Create Table
Log on to an Oracle 19c/21c pluggable database (PDB) and create an external table using the dbms_cloud package.
The example below creates an external table called airports_fb using the airports.csv file stored in an S3 Object Storage bucket called ora-bucket.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
begin dbms_cloud.create_external_table( table_name => ‘airports_fb’, credential_name => ‘fb_cred’, file_uri_list => ‘<a href=”https://s3-fbstaines02.amazonaws.com/ora-bucket/airports.csv”>https://s3-fbstaines02.uklab.purestorage.com/ora-bucket/airports.csv</a>’, column_list >=> ‘ident varchar2(20), type varchar2(20), name varchar2(20), elevation_ft number(6), continent varchar2(20), iso_country varchar2(20), iso_region varchar2(20), municipality varchar2(20)’, format => json_object(‘type’ value ‘csv’, ‘skipheaders’ value ‘1’, ‘ignoremissingcolumns’ value ‘true’, ‘conversionerrors’ value ‘store_null’, ‘rejectlimit’ value ‘unlimited’) ); end; / |
Query Table
Now that we’ve created an external S3 table, we can confirm the table definition using the SQLPlus desc command. VARCHAR2(20)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 – Production Version 21.6.0.0.0 SQL> desc airports_fb; Name Null? Type ————————————————————— ———— ——————– IDENT VARCHAR2(20) TYPE VARCHAR2(20) NAME VARCHAR2(20) ELEVATION_FT NUMBER(6) CONTINENT VARCHAR2(20) ISO_COUNTRY VARCHAR2(20) ISO_REGION VARCHAR2(20) MUNICIPALITY VARCHAR2(20) SQL> |
We can check the number of rows by using count, for example.
1 2 3 4 5 6 7 8 9 |
SQL> select count(*) from airports_fb; COUNT(*) ————— 72911 SQL> |
And run a query against the external table, for example, to return all airports that have London in their name.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SQL> select NAME, ISO_REGION, MUNICIPALITY from AIRPORTS_FB where NAME like ‘%London%’; NAME ISO_REGION MUNICIPALITY —————————— —————————— —————————— London Airport CA–ON London London Luton Airport GB–ENG London London City Airport GB–ENG London London Heliport GB–ENG London London Colney GB–ENG London Gliding Club GB–ENG Dunstable New London Airport US–VA. Forest 7 rows selected. SQL> |
Explain Plan
We can confirm our query is using an external table by creating an explain plan for the sql, for example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
SQL> EXPLAIN PLAN FOR 2 select NAME, ISO_REGION, MUNICIPALITY from AIRPORTS_FB where NAME like ‘%London%’ order by NAME; Explained. SQL> select plan_table_output from 2 table(dbms_xplan.display(‘plan_table’,null,‘basic’)); PLAN_TABLE_OUTPUT ———————————————————————————————————————— Plan hash value: 587737101 ———————————————————————————— | Id | Operation | Name | ———————————————————————————— | 0 | SELECT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (ORDER) | :TQ10001 | | 3 | SORT ORDER BY | | | 4 | PX RECEIVE | | | 5 | PX SEND RANGE | :TQ10000 | PLAN_TABLE_OUTPUT ———————————————————————————————————————— | 6 | PX BLOCK ITERATOR | | | 7 | EXTERNAL TABLE ACCESS FULL| AIRPORTS_FB | ———————————————————————————— 14 rows selected. SQL> |
From the explain plan, we can see that Oracle has used an external table.
Uncomplicate Oracle Data Storage
Table Update
From the output above, we can see the record for London Colney is missing a value for municipality. What happens if we try to update a row?
1 2 3 4 5 6 7 8 9 |
SQL> update AIRPORTS_FB set MUNICIPALITY = ‘London’ where NAME=‘London Colney’; update AIRPORTS_FB set MUNICIPALITY = ‘London’ where NAME=‘London Colney’ * ERROR at line 1: ORA–30657: operation not supported on external organized table< |
Or, attempt to delete a row?
1 2 3 4 5 6 7 8 9 |
SQL> delete from AIRPORTS_FB where NAME=‘London Colney’; delete from AIRPORTS_FB where NAME=‘London Colney’ * ERROR at line 1: ORA–30657: operation not supported on external organized table |
As expected, they both fail as external tables don’t support data manipulation language (DML). They’re designed to be read-only, therefore any amendments need to be performed externally.
Summary on Oracle
In this post, I’ve demonstrated how you can access and query external data stored in FlashBlade S3 Object Storage buckets directly from an Oracle 19c/21c database using the Oracle DBMS_CLOUD package.