However, in some environments, that is not possible and you want to really delete the data as quickly as possible. Additional complication could be that you need to delete rows from a master-detail table. The solution to this is to have as many sessions as possible run the delete statements. Each of this sessions can focus on a chunk of data to delete so that they can work in parallel.
This process could be built manually or we could use a new 11g package dbms_parallel_execute.
Usually, the table you want data to be deleted from, may be the master table with many Detail tables having Foreign Key Constraints (FK) . In this case, you will have to delete the Child table data and then the master table data all in parallel chunks.
below is a sample script where i start 50 concurrent sessions and delete about 40 million rows from the master table and its corresponding detail table ...
drop table po_parallel purge;
create table po_parallel as (select /*+ parallel(a,16) full(a) hash(a,bc) */ a.PO_TXN_KEY from "APS"."PO_TXN" a,
APS.cal bc where bc.cal_key = a.batch_cal_key and bc.actv_flg = 'N' ;
-- create table with key columns that needs to be deleted based on the delete logic.
APS.cal bc where bc.cal_key = a.batch_cal_key and bc.actv_flg = 'N' ;
/* if you need to delete rows from master-detail tables based on master key column, you need to create a procedure. If it is delete from a single table, then a simple delete statement will be enough.
Note: One common rookie mistake is to write a delete statement without the where IN clause to pick the pre-selected key-columns. */
Note: One common rookie mistake is to write a delete statement without the where IN clause to pick the pre-selected key-columns. */
create or replace
PROCEDURE parallel_DML_PO (p_start_id IN NUMBER, p_end_id IN NUMBER) AS
BEGIN
PROCEDURE parallel_DML_PO (p_start_id IN NUMBER, p_end_id IN NUMBER) AS
BEGIN
-- Delete from the detail table before the master.
Delete from APS.INVC_TXN where PO_TXN_KEY in ( select po_txn_key from PO_PARALLEL WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);
Delete from APS.PO_TXN_ATTR where PO_TXN_KEY in ( select po_txn_key from PO_PARALLEL WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);
Delete from APS.INVC_TXN where PO_TXN_KEY in ( select po_txn_key from PO_PARALLEL WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);
Delete from APS.PO_TXN_ATTR where PO_TXN_KEY in ( select po_txn_key from PO_PARALLEL WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);
Delete from APS.PO_TXN a where PO_TXN_KEY in ( select po_txn_key from PO_PARALLEL WHERE PO_TXN_KEY BETWEEN p_start_id AND p_end_id);
commit;
end parallel_DML_PO;
/
-- Next you can run the procedure to execute in parallel.
DECLARE
l_task VARCHAR2(30) := 'test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
l_task VARCHAR2(30) := 'test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- create a task
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
-- point to key column and set batch size
DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
-- point to key column and set batch size
DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col
(task_name => l_task,
table_owner => 'VIJAY',
table_name => 'PO_PARALLEL',
table_column => 'PO_TXN_KEY',
chunk_size => 100000);
table_owner => 'VIJAY',
table_name => 'PO_PARALLEL',
table_column => 'PO_TXN_KEY',
chunk_size => 100000);
-- specify the sql statement
l_sql_stmt := 'BEGIN parallel_DML_PO(:start_id, :end_id); END;';
-- run task in parallel
DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,
sql_stmt => l_sql_stmt,
language_flag => DBMS_SQL.NATIVE,
parallel_level => 50);
-- If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task(l_task);
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
END LOOP;
-- DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
l_sql_stmt := 'BEGIN parallel_DML_PO(:start_id, :end_id); END;';
-- run task in parallel
DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,
sql_stmt => l_sql_stmt,
language_flag => DBMS_SQL.NATIVE,
parallel_level => 50);
-- If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task(l_task);
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
END LOOP;
-- DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
-- to monitor the progress see
SELECT chunk_id, status, start_id, end_id FROM user_parallel_execute_chunks WHERE task_name = 'test_task' ORDER BY chunk_id;
Once the job is complete you can drop the task, which will drop the associated chunk information also.
BEGIN DBMS_PARALLEL_EXECUTE.drop_task('test_task'); END; /
While this is a example specific to my database, A decade later, I had to do some complex inserts. I did speed it up using FORALL, however, since I needed billions of rows generated, I used this method to insert billions of rows in parallel. You can check that example here.
No comments:
Post a Comment
Feedback welcome