Monday, April 29, 2013

Speed Delete huge amount of rows using dbms_parallel_execute (performance)

The best method to delete  millions of rows  from a large table is to simply rebuild the table.
However, in some environments, that is not possible and you want to really delete the data as quickly as possible. Additional complication could be that you need to delete rows from a master-detail table. The solution to this is to have as many sessions as possible run the delete statements. Each of this sessions can focus on a chunk of data to delete so that they can work in parallel.

This process could be built manually or we could use a new 11g package dbms_parallel_execute.

 Usually, the table you want data to be deleted from, may be the master table with many Detail tables having Foreign Key Constraints (FK) . In this case, you will have to delete the Child table data and then the master table data all in parallel chunks.

below is a sample script where i start 50 concurrent sessions and delete about 40 million rows from the master table and its corresponding detail table ...


drop table po_parallel purge;

-- create table with key columns that needs to be deleted based on the delete logic.

create table po_parallel as (select /*+ parallel(a,16) full(a) hash(a,bc) */ a.PO_TXN_KEY from "APS"."PO_TXN" a,
APS.cal bc where bc.cal_key = a.batch_cal_key and bc.actv_flg = 'N'  ;

/* if you need to delete rows from master-detail tables based on master key column, you need to create a procedure. If it is delete from a single table, then a simple delete statement will be enough.
Note: One common  rookie mistake is to write a delete statement without the where IN clause to pick the pre-selected key-columns. */


create or replace
PROCEDURE parallel_DML_PO (p_start_id IN NUMBER, p_end_id IN NUMBER) AS
BEGIN
    -- Delete from the detail table before the master.
  Delete from APS.INVC_TXN  where PO_TXN_KEY in    ( select po_txn_key from PO_PARALLEL WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);

Delete from APS.PO_TXN_ATTR  where PO_TXN_KEY in ( select po_txn_key from PO_PARALLEL  WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);

Delete  from APS.PO_TXN a  where PO_TXN_KEY in   ( select po_txn_key from PO_PARALLEL  WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);

commit;

end  parallel_DML_PO;
/

-- Next you can run the procedure to execute in parallel.

DECLARE
  l_task     VARCHAR2(30) := 'test_task';
  l_sql_stmt VARCHAR2(32767);
  l_try      NUMBER;
  l_status   NUMBER;
BEGIN
   -- create a task
  DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
   -- point to key column and set batch size
  DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col
(task_name    => l_task,
 table_owner  => 'VIJAY',
 table_name   => 'PO_PARALLEL',
 table_column => 'PO_TXN_KEY',
 chunk_size   => 100000);

    -- specify the sql statement
  l_sql_stmt := 'BEGIN parallel_DML_PO(:start_id, :end_id); END;';

   -- run task in parallel
  DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,
                                 sql_stmt       => l_sql_stmt,
                                 language_flag  => DBMS_SQL.NATIVE,
                                 parallel_level => 50);

  -- If there is error, RESUME it for at most 2 times.
  l_try := 0;
  l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
  Loop
    l_try := l_try + 1;
    DBMS_PARALLEL_EXECUTE.resume_task(l_task);
    l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  END LOOP;

-- DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/

-- to monitor the progress see
SELECT chunk_id, status, start_id, end_id
FROM   user_parallel_execute_chunks
WHERE  task_name = 'test_task'
ORDER BY chunk_id;



Once the job is complete you can drop the task, which will drop the associated chunk information also.
BEGIN
  DBMS_PARALLEL_EXECUTE.drop_task('test_task');
END;
/


While this is a example specific to my database, A decade later, I had to do some complex inserts. I did speed it up using FORALL, however, since I needed billions of rows generated, I used this method to insert billions of rows in parallel. You can check that example here.

 

No comments:

Post a Comment

Feedback welcome