Anonymization After PROD To NONPROD Syncing

The most valuable asset nowadays is data. Having said that, every company must protect it at all costs. One way is the protection of a production environment. However, you should not forget about non-prod environments; that is where the anonymization comes into play.

If you are interested in production data protection, read these advanced articles:

As for non-prod environments, every developer knows they get messy over time. They work on several projects, some of them will not push through, some might be just testing cases, research, and so on and so forth. Therefore it’s not only wise but recommended to refresh non-prod environments every now and then to get clean setup. How to do that? Here we go 🙂

Backup and Restore

Oracle offers you two ways to backup and restore:

  • via RMAN
  • manually

If you feel confident enough to manually handle everything, be my guest 🙂 I strongly suggest using RMAN especially because you can do the backup on the fly.


Once you restored your production in a non-prod environment you have to anonymize (or obfuscate, if you will) the data, because you don’t want your developers to play around with real clients’ data.

It is actually simpler than one might think 🙂 You just need to do a few steps.

  • Create a data dictionary for sensitive data (owner, table_name, column_name)
  • Define how to obfuscate the data (data type, obfuscating method)
  • Create a package/procedure which will run every time you restore the DB
    • as a “post-deployment” script

And voila – you are safe 🙂 Let’s deep dive on those topics a little closer with some examples.

Create a data dictionary for sensitive data

This one is actually pretty simple, though a bit tedious. You need to identify all sensitive data in your database and store them in a master table.

A sample script to gather such details might look like:

CREATE TABLE sensitive_data_dictionary
SELECT owner
, table_name
, column_name
, data_type
, 'Y' as flag_obfuscate
FROM dba_tab_columns

Define how to obfuscate the data

Here you have to decide on how to manipulate the data while not destroying their value and/or consistency. For numbers and strings, you can either use a built-in hash function (such as ORA_HASH, DBMS_CRYPTO, …) or you can stick with a simple TRANSLATE 🙂

You can consider deleting completely all CLOB, LOB, and similar data types (in case you store sensitive data there as well) or replace them with some dummy data. Regarding dates, I would leave them as is (there is no issue with the date as long as it cannot be connected to a person). However, if you insist on masking dates as well, I would recommend randomly adding months and days. Just be cautious about not messing up the data consistency, integrity, and application logic.

A post-deployment script

Here is a sample script you can run after the restoration.


  v_alphaNum_list varchar(62) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  v_num_list varchar(62) := '0123456789';
  v_update varchar2(10000) := '';
  v_counter number;
  v_credit_card_mask_len number := 12;

   -- get the list of owners and their tables to be processed
   FOR i IN (
             SELECT owner
             , table_name  
             , count(*) as num_columns
             FROM sensitive_data_dictionary
             GROUP BY owner
             , table_name
       -- init v_counter
       v_counter := 1;  
       -- set_table for update
       v_update := 'UPDATE ' || i.owner ||'.'|| i.table_name || ' SET ';
           -- gather all columns and how to mask them
           FOR j IN (
                      SELECT column_name
                      , CASE
                          -- special treatment
                          WHEN column_name = 'CREDIT_CARD_NO' THEN 'lpad(''x'','||v_credit_card_mask_len||',''x'') || substr('||column_name||', '||v_credit_card_mask_len||', length('||column_name||'))'
                          -- else
                          WHEN data_type IN ('VARCHAR2', 'VARCHAR', 'CHAR') THEN 'translate('||column_name||','''||v_alphaNum_list||''',dbms_random.string(''A'', 52)||round(dbms_random.value(10000000000,1)))'
                          WHEN data_type IN ('NUMBER') THEN 'rpad(substr('||column_name||', 0, 3)||translate('||column_name||', '''||v_num_list||''', round(dbms_random.value(1000000000, 1))),length('||column_name||'), round(dbms_random.value(9, 1)))'
                        END as new_value
                      FROM sensitive_data_dictionary
                      WHERE 1=1
                      AND flag_obfuscate = 'Y'
                      AND owner = i.owner
                      AND table_name = i.table_name
              -- add columns to update
              v_update :=  v_update || j.column_name || ' = ' || j.new_value;
              -- do not add the comma to the last instance
              IF v_counter < i.num_columns THEN
                v_update := v_update || ', '; 
              END IF;
              --increment v_counter
              v_counter := v_counter+1;
           END LOOP;
        -- print out   
        -- execute 
        EXECUTE IMMEDIATE v_update;       
          dbms_output.put_line('SqlCDE: '||SQLCODE);
          dbms_output.put_line('SqlMSG: '||SQLERRM);

Remember, this is just an example of how you can do it. It can be more sophisticated and complex. I highly recommend creating this as a package with procedures and functions inside. So instead of those repetitive translates, you can create corresponding functions:

  • obfuscate_credit_card(….)
  • obfuscate_text(…)
  • obfuscate_value(…)

There are many ways and they will all differ based on the DB design, architecture, individual needs, and whatnot. Feel free to grab this one and enhance according to your needs 🙂