The most valuable asset nowadays is data. Having said that, every company must protect it at all costs. One way is the protection of a production environment. However, you should not forget about non-prod environments; that is where the anonymization comes into play.
If you are interested in production data protection, read these advanced articles:
As for non-prod environments, every developer knows they get messy over time. They work on several projects, some of them will not push through, some might be just testing cases, research, and so on and so forth. Therefore it’s not only wise but recommended to refresh non-prod environments every now and then to get clean setup. How to do that? Here we go 🙂
Backup and Restore
Oracle offers you two ways to backup and restore:
- via RMAN
- manually
If you feel confident enough to manually handle everything, be my guest 🙂 I strongly suggest using RMAN especially because you can do the backup on the fly.
Anonymization
Once you restored your production in a non-prod environment you have to anonymize (or obfuscate, if you will) the data, because you don’t want your developers to play around with real clients’ data.
It is actually simpler than one might think 🙂 You just need to do a few steps.
- Create a data dictionary for sensitive data (owner, table_name, column_name)
- Define how to obfuscate the data (data type, obfuscating method)
- Create a package/procedure which will run every time you restore the DB
- as a “post-deployment” script
And voila – you are safe 🙂 Let’s deep dive on those topics a little closer with some examples.
Create a data dictionary for sensitive data
This one is actually pretty simple, though a bit tedious. You need to identify all sensitive data in your database and store them in a master table.
A sample script to gather such details might look like:
CREATE TABLE sensitive_data_dictionary SELECT owner , table_name , column_name , data_type , 'Y' as flag_obfuscate FROM dba_tab_columns WHERE 1=1 AND column_name IN ('FIRST_NAME', 'LAST_NAME', 'EMAIL', 'CREDIT_CARD_NO', 'MOBILE_PHONE_NO', 'BANK_ACCOUNT_NO', 'DRIVERS_LICENSE','SSN');
Define how to obfuscate the data
Here you have to decide on how to manipulate the data while not destroying their value and/or consistency. For numbers and strings, you can either use a built-in hash function (such as ORA_HASH, DBMS_CRYPTO, …) or you can stick with a simple TRANSLATE 🙂
You can consider deleting completely all CLOB, LOB, and similar data types (in case you store sensitive data there as well) or replace them with some dummy data. Regarding dates, I would leave them as is (there is no issue with the date as long as it cannot be connected to a person). However, if you insist on masking dates as well, I would recommend randomly adding months and days. Just be cautious about not messing up the data consistency, integrity, and application logic.
A post-deployment script
Here is a sample script you can run after the restoration.
DECLARE v_alphaNum_list varchar(62) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'; v_num_list varchar(62) := '0123456789'; v_update varchar2(10000) := ''; v_counter number; v_credit_card_mask_len number := 12; BEGIN -- get the list of owners and their tables to be processed FOR i IN ( SELECT owner , table_name , count(*) as num_columns FROM sensitive_data_dictionary GROUP BY owner , table_name ) LOOP -- init v_counter v_counter := 1; -- set_table for update v_update := 'UPDATE ' || i.owner ||'.'|| i.table_name || ' SET '; -- gather all columns and how to mask them FOR j IN ( SELECT column_name , CASE -- special treatment WHEN column_name = 'CREDIT_CARD_NO' THEN 'lpad(''x'','||v_credit_card_mask_len||',''x'') || substr('||column_name||', '||v_credit_card_mask_len||', length('||column_name||'))' -- else WHEN data_type IN ('VARCHAR2', 'VARCHAR', 'CHAR') THEN 'translate('||column_name||','''||v_alphaNum_list||''',dbms_random.string(''A'', 52)||round(dbms_random.value(10000000000,1)))' WHEN data_type IN ('NUMBER') THEN 'rpad(substr('||column_name||', 0, 3)||translate('||column_name||', '''||v_num_list||''', round(dbms_random.value(1000000000, 1))),length('||column_name||'), round(dbms_random.value(9, 1)))' END as new_value FROM sensitive_data_dictionary WHERE 1=1 AND flag_obfuscate = 'Y' AND owner = i.owner AND table_name = i.table_name ) LOOP -- add columns to update v_update := v_update || j.column_name || ' = ' || j.new_value; -- do not add the comma to the last instance IF v_counter < i.num_columns THEN v_update := v_update || ', '; END IF; --increment v_counter v_counter := v_counter+1; END LOOP; -- print out --dbms_output.put_line(v_update); -- execute EXECUTE IMMEDIATE v_update; END LOOP; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('SqlCDE: '||SQLCODE); dbms_output.put_line('SqlMSG: '||SQLERRM); END;
Remember, this is just an example of how you can do it. It can be more sophisticated and complex. I highly recommend creating this as a package with procedures and functions inside. So instead of those repetitive translates, you can create corresponding functions:
- obfuscate_credit_card(….)
- obfuscate_text(…)
- obfuscate_value(…)
- …
There are many ways and they will all differ based on the DB design, architecture, individual needs, and whatnot. Feel free to grab this one and enhance according to your needs 🙂