Thursday, 1 April 2010

Materialized View Refresh and the ATOMIC_REFRESH parameter

When refreshing big materialized views in large data warehouses it is always good to check the parameter options available in the DBMS_MVIEW.REFRESH procedure.

Oracle changes the default parameters of its DBMS packages from release to release. I remember back at the times of Oracle 9i a complete refresh would truncate the materialized view, thus the only work that the database was actually doing in a complete refresh, was just an INSERT after the TRUNCATE.

Now in Oracle 10g and Oracle 11g parameters have changed. When there is a COMPLETE materialized view refresh, for the purposes of data preservation, a DELETE is done instead of a TRUNCATE!

The reason for this is because Oracle "changed" the default parameter value of ATOMIC_REFRESH in the DBMS_MVIEW.REFRESH package. In earlier releases the parameter was set to FALSE by default but now it is set to TRUE, which forces a DELETE of the materialized view instead of  TRUNCATE, making the materialized view more "available" at refresh time.

But this DELETE is an expensive operation in terms of refresh time it takes. A DELETE is always expensive and sometimes even impossible when we are talking about the complete refresh of materialized views with millions of rows. At COMPLETE refresh a DELETE command takes long time, as it has to make the materialized view available to the users and comply with the ACID theory and its transactional reasons, but who cares? What if we really want to get the refresh done quickly and we don't care about the availability of the materialized view, say in a data warehouse maintenance window?

I have tested how the refresh time improves when you set the ATOMIC_REFRESH => FALSE in the procedure on a 1 CPU home computer with Oracle 11g. If you have more CPUs it will even be quicker because of the PARALLEL parameter.

I just wanted to show the improvement you get in refresh times when ATOMIC_REFRESH is FALSE and Oracle does a  TRUNCATE instead of a DELETE at complete refresh. A real time saver with big materialized views in data warehouses. Here is the test:

-- Enable parallel DML
ALTER SESSION FORCE PARALLEL DML

-- Cleanup
-- DROP TABLE BASE_TABLE;

-- Create the Test table
CREATE TABLE BASE_TABLE
  (
    id NUMBER(10) PRIMARY KEY NOT NULL,
    X  VARCHAR2(255),
    Y  VARCHAR2(255)
  )
  PARTITION BY RANGE
  (
    ID
  )
  INTERVAL
  (
    100000
  )
  (
    PARTITION P1 VALUES LESS THAN (100000),
    PARTITION P2 VALUES LESS THAN (200000)
  );

-- Make the test table paralllel
ALTER TABLE BASE_TABLE PARALLEL 5

Load the test table with data from a dummy cartesian join so that you get millions of rows. Do you like my loop? Loops only once. Well I had it there to add more millions if I wanted to :-) >

BEGIN
  FOR I IN 1 .. 1
  LOOP
    INSERT   /*+ PARALLEL(BASE_TABLE, 5) */    INTO BASE_TABLE SELECT ROWNUM, a.object_name, a.status  FROM all_objects a, emp b, dept c;
  END LOOP;
  COMMIT;
END;

3010392 rows inserted

Create a materialized view log for fast refresh

DROP MATERIALIZED VIEW LOG ON BASE_TABLE;
CREATE MATERIALIZED VIEW LOG ON BASE_TABLE;

Create the fast refresh materialized view

DROP MATERIALIZED VIEW MV_BASE_TABLE;
CREATE materialized VIEW mv_base_table parallel 5 refresh fast
AS
  SELECT * FROM BASE_TABLE;

Update the Test table to simulate changing data

UPDATE BASE_TABLE SET Y='INVALID';
COMMIT;

3010392 rows updated

Now you are ready to do the materialized view refresh with the ATOMIC_REFRESH values set to TRUE and then to FALSE. Observe the refresh times. Refresh the materialized view with the two different values in the 
ATOMIC_REFRESH parameter.

TRUE case with DELETE

EXEC DBMS_MVIEW.REFRESH(LIST => 'MV_BASE_TABLE', METHOD => 'C', ATOMIC_REFRESH => TRUE);

Elapsed 558.8 seconds

Now is time to do the test with the ATOMIC_REFRESH parameter set to FALSE.

FALSE case with TRUNCATE 
Must update again to simulate change in the logs

UPDATE BASE_TABLE SET Y='VALID';
COMMIT;

EXEC DBMS_MVIEW.REFRESH(LIST => 'MV_BASE_TABLE', METHOD => 'C', ATOMIC_REFRESH => FALSE);

Elapsed 215.3 seconds

Half the time! Well if you have real computers it will be much faster. So again we see DELETE is really not good when you work with millions of rows. Nice to have options!