Wednesday, July 24, 2013

parallel_enable partition by hash cluster

parallel_enable partition by hash


Below i have tried to illustrate a simple example of using GROUP BY function with PARALLEL_ENABLE TABLE functions in ORACLE.


-- Assume you create a sample dataset from DBA_OBJECTs and you want to run a simple sum(obect_id) and count(object_id) group by owner ..


/* now a simple group by result will look like this.
*/

 however, if the data in the table has to be validated through a complex validation process, then this could take a lot of time when the table is large.
 This is because the validation process done through procedure or function is a serial process. In order to speed this process, we can run the validation process in parallel through use of parallel_enabled table function, but still maintain the ability to group by column for the end output. This way, you will be able run data validation process in parallel and at the same time create table functions capable of doing GROUP BY  aggregations.
-- now in my first iteration, I try to do the  paralle_enable table function using  PARTITION BY ANY,  This is the what is most common way to partition data.

the function created to do this is below.

FUNCTION test_group_any( p_cursor IN SYS_REFCURSOR)
  RETURN t_owner_summary_tab PIPELINED PARALLEL_ENABLE(
    PARTITION p_cursor BY ANY)
IS
  in_row data_tab%ROWTYPE;
  out_row t_owner_summary_row ;
  v_total NUMBER := NULL;
  v_owner data_tab.owner%TYPE;
  v_count NUMBER := 0;
BEGIN
  -- for every transaction
  LOOP
    FETCH p_cursor
    INTO in_row;
    EXIT
  WHEN p_cursor%NOTFOUND;
    -- if we pass the extensive validation check
    --   IF super_complex_validation(
in_row.object_id,in_row.owner) THEN
    -- set initial total or add to current  total
    -- or return total as required
    IF v_total        IS NULL THEN
      v_total         := in_row.object_id;
      v_owner         := in_row.owner;
      v_count         := 1;
    ELSIF in_row.owner = v_owner THEN
      v_total         := v_total + in_row.object_id;
      v_count         := v_count + 1;
    ELSE
      out_row.owner          := v_owner;
      out_row.sum_total_id   := v_total;
      out_row.count_total_id := v_count ;
      PIPE ROW(out_row);
      v_total := in_row.object_id;
      v_owner := in_row.owner;
      v_count := 1;
    END IF;
    --  END IF;  -- end super complex validation 
  END LOOP; -- every transaction
  out_row.owner          := v_owner;
  out_row.sum_total_id   := v_total;
  out_row.count_total_id :=v_count;
  PIPE ROW(out_row);
  RETURN ;
END test_group_any;




-- the data processed by each parallel process is random and you will get a lot more rows than expected.. The result is not what we are looking for.
 So what if you partition by hash on owner ?
 Note:  in order to PARTITION BY HASH,  the database needs to know the table/object you are passing. so a sys_refcursor will not work.  I changed sys_refcursor to table of %ROWTYPE. from the above function and changed to PARTITION BY HASH instead of PARTITION by ANY.

FUNCTION test_group_hash(  p_cursor IN t_data_tab_ref_cursor)
  RETURN t_owner_summary_tab PIPELINED PARALLEL_ENABLE(  PARTITION p_cursor BY HASH(  owner))
IS



-- still not what we want.. the output is almost similar to partition by any ..
-- so now, i added one additional line  CLUSTER BY OWNER.  This will tell the database to send all rows with the same value for owners to a single parallel slave function. 

FUNCTION test_group_hashcluster(  p_cursor IN t_data_tab_ref_cursor)
  RETURN t_owner_summary_tab PIPELINED PARALLEL_ENABLE(   PARTITION p_cursor BY HASH(  owner))
 CLUSTER p_cursor BY(  owner)



 Hurray. success

 now the result is exactly similar to our sql group by function.
however, it took about 27 seconds even with parallel degree 5 in my Virtual machine ..
So i added code to BULK COLLECT in parallel and it runs in less than 2 seconds.



Acknowledgments.
  Darryl Hurley who wrote chapter Chapter 3 


dbms_comparison quicker than SQL MINUS



Overview


The DBMS_COMPARISON package is an Oracle-supplied package that you can use to compare database objects at two databases. This package also enables you converge the database objects so that they are consistent at different databases. Typically, this package is used in environments that share a database object at multiple databases. When copies of the same database object exist at multiple databases, the database object is a shared database object.
Using DBMS_COMPARISON is considered to be faster than traditional comparison  and used internally by database when using streams replication. When the data to be compared is huge, this package could speed up comparison.


Limitation

The index columns in a comparison must uniquely identify every row involved in a comparison. The following constraints satisfy this requirement:
  • A primary key constraint
  • A unique constraint on one or more non-NULL columns

Performance

   Internally the package does  a FAST INDEX SCAN on the primary key or unique key INDEX to find the MIN and MAX value of the table.
     Then, it does a FULL TABLE SCAN to populate the buckets. If there is  miss match in the checksum values of the data in these buckets, additional scans are done to identify the rows..

   The performance of the dbms_compare is equal to the time it takes to do 2 FAST INDEX SCANS and 2 FULL TABLE SCANS. 
    To speed this further, you could 
  • Alter the degree of the underlying tables 
  • enable PARALLEL DEGREE_POLICY to AUTO and PARALLEL_DEGREE_LIMIT=8 to speed it further. and after it completes, set the parameter back to original value  

Advantage over sql minus

To get all rows without dbms_compare will take 4 FTS with could take double the time as dbms_compare.  delta = (A-B) U (B-A)

  select * from table_A 
  minus
  select * from table_B
  union all
  (select * from table_B
  minus
  select * from table_A) ;
    
  • This would take 4 FULL TABLE SCANS compared to 2 in DBMS_COMPARE.
  •  it took a lot more temporary tablespace than dbms_compare as it has to sort the entire table with all the columns.
Also, the rowid of the divergent rows are stored for each comparison. So if we need to converge the data, it can done very quickly. 

   

Perform a Compare with DBMS_COMPARISON

To perform a comparison you will follow these steps:
  1. Create the comparison 
  2. Execute the comparison 
  3. Review the results of the comparison 
  4. Converge the data if desired
  5. Recheck data (the synced data -- optional) 
  6. Purge and Drop

Create Comparison


BEGIN
dbms_comparison.create_comparison(
comparison_name=>'VJ_COMPARE',
schema_name=>'SCOTT',
object_name=>'PO_TXN',
dblink_name=>null,
remote_schema_name=>'SCOTT',
remote_object_name=>'VJ_PO_TXN');
END;
/

Execute Comparison

-- If the Primary key or unique index are already defined on the table, then it is picked up automatically for comparision.

Set serveroutput on
declare
     compare_info dbms_comparison.comparison_type;
     compare_return boolean;
begin
     compare_return :=
     dbms_comparison.compare (
     comparison_name=>'VJ_COMPARE',
     scan_info=>compare_info,
     perform_row_dif=>TRUE);
     if compare_return=TRUE
     then
          dbms_output.put_line('the tables are equivalent.');
     else
          dbms_output.put_line('Bad news... there is data divergence.');
          dbms_output.put_line('Check the  and
               dba_comparison_scan_summary views for locate the differences for
               scan_id:'||compare_info.scan_id);
     end if;
end;
/


Data Converge (optional)

  If you want to converge the data, you can use DBMS_COMPARISON.CONVERGE to sync the tables. You need to specify which table is the master whose data is merged.
  
set serveroutput on
  declare
     compare_info dbms_comparison.comparison_type;
     scan_id_num number;
begin
      select scan_id into scan_id_num from DBA_COMPARISON_SCAN_SUMMARY where comparison_name='VJ_COMPARE' and status='ROW DIF';
     dbms_comparison.converge (
          comparison_name=>'VJ_COMPARE',
          scan_id=>scan_id_num,
          scan_info=>compare_info,
          converge_options=>dbms_comparison.CMP_CONVERGE_LOCAL_WINS);
          -- converge_options=>dbms_comparison.CMP_CONVERGE_REMOTE_WINS is the other option.
          dbms_output.put_line('--- Results ---');
          dbms_output.put_line('Local rows Merged by process:
                               '||compare_info.loc_rows_merged);
          dbms_output.put_line('Remote rows Merged by process:
                               '||compare_info.rmt_rows_merged);
          dbms_output.put_line('Local rows Deleted by process:
                               '||compare_info.loc_rows_deleted);
          dbms_output.put_line('Remote rows Deleted by process:
          '||compare_info.rmt_rows_deleted);
end;
/

Drop Comparison


 BEGIN
  DBMS_COMPARISON.PURGE_COMPARISON(
    comparison_name => 'VJ_COMPARE',
    scan_id         => NULL,
    purge_time      => NULL);
END;
/

BEGIN
  DBMS_COMPARISON.DROP_COMPARISON(
    comparison_name => 'VJ_COMPARE');
END;
/

Wednesday, July 10, 2013

Multiple EXTPROC in 11.2 /EXADATA and 12c PDBs


we know that external procedures or C libraries can interface with oracle database  through extproc. it is defined as follows.
CREATE OR REPLACE  LIBRARY libname as '$APP_V1_HOME/appexe.so' within the database and write a plsql wrapper over it.

There are two ways to set the value for the environment variable APP_V1_HOME defined during library creation. They are
  •  extproc.ora
  •  SID_LIST_LISTENER.

extproc.ora : The value of APP_VI_HOME used to be stored in listener.ora under SID_LIST_LISTENER until oracle version 11gr2.  However in 11gr2, Oracle introduced the concept of a single file where all the environment variables can be stored. The file is in $ORACLE_HOME/hs/admin/extproc.ora . There are no configuration changes required for either listener.ora or tnsnames.ora.
NOTE: In case this is a RAC or exadata, update the extproc.ora on ORACLE_HOME and not GRID home across all the nodes.

Working with the above example, the information in extproc.ora will be

SET APP_V1_HOME=/home/app/9.3
SET EXTPROC_DLL=ANY

if you had a second app or a newer version of the app in a different directory, all we need to do is add that info in extproc.ora as follows


Specify the EXTPROC_DLLS environment variable to restrict the DLLs that extproc is allowed to load.
# Without the EXTPROC_DLLS environment variable, extproc loads DLLs from ORACLE_HOME/lib 
SET EXTPROC_DLL=ANY
# APP version V1
SET APP_V1_HOME=/home/app/9.3
#newer app version V2
SET APP_V2_HOME=/home/app/9.4


and define the library in the database as follows

CREATE OR REPLACE  LIBRARY libname as '$APP_V2_HOME/appexe.so' .
Now the above will work within the database and across different databases in the same ORACLE_HOME.

2) SID_LIST_LISTENER
Until 11.2 , the only way to configure extproc was through listener.ora configuration. 

The steps are
1) configure listener.ora
2) add tnsname.ora entry
3) define library with AGENT clause

1) Configure listener.ora.

Create a  separate listener for extproc and define the individual environments in SID_LIST LISTENER.
In case of RAC /Exadata environment, use the ORACLE_HOME to do this. The GRID_HOME is maintained by OraAgent and does some automatic configuration of LOCAL_LISTENER parameters if the file is not touched. Also, i see no additional advantage starting listener for extproc from GRID_HOME.

Below is the sample listing of listener.ora. Note that the enthronement variable APP_HOME points to different directories for different SID_NAME.
The  SID_NAME NEED NOT correspond to the database names. They are just place holders.


LISTENER_DLIB =
(DESCRIPTION_LIST =
  (DESCRIPTION =
  (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROCDLIB))
  )
)

SID_LIST_LISTENER_DLIB =
  (SID_LIST =
   (SID_DESC=
      (SID_NAME=DLIB93)
      (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM=extproc)
      (ENVS="LD_LIBRARY_PATH=/home/app/9.3,EXTPROC_DLLS=ANY",APP_HOME=/home/app/9.3)
    )
   (SID_DESC=
      (SID_NAME=DLIB94)
      (ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM=extproc)
      (ENVS="LD_LIBRARY_PATH=/home/app/9.4,EXTPROC_DLLS=ANY,APP_HOME=/home/app/9.4")
    )
  )
2) configure tnsnames.ora

exaproc_93 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROCDLIB))
    )
    (CONNECT_DATA =
      (SID = DLIB93)
      (PRESENTATION = RO)
    )
  )

exaproc_94 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROCDLIB))
    )
    (CONNECT_DATA =
      (SID = DLIB94)
      (PRESENTATION = RO)
    )
  )

to test the validity, you could start your listener as
lsnrctl start listener_dlib
and check the connect string.
tnsping exaproc_93 

3) configure the LIBRARY ...AGENT.

CREATE <PUBLIC> DATABASE LINK  DBLIB_AGENT using 'extproc_93';
CREATE OR REPLACE  LIBRARY libname as '$APP_HOME/appexe.so'  agent 'DBLIB_AGENT';


observe the AGENT clause in library definition.
Whenever the library is called, oracle looks at the DBLINK in the AGENT clause. If it exists, then the tnsnames.ora file in the DBLINK is lookedup. The tnsnames.ora will in turn point to the SID information in listener.ora. The environment info for that listener corresponding to the SID is pickup dynamically from listener.
  AGENT --> DB LINK --> tnsnames.ora --> listener.ora --> ENVS for that SID.

So , if you need to point your application to different version of shared library without touching the application code, you can simply change the connect string of the DBLINK or change the ENVS info in listener.ora itself.

eg. DROP   <PUBLIC> DATABASE LINK  DBLIB_AGENT ; -- with 'exaproc_64'

    CREATE <PUBLIC> DATABASE LINK  DBLIB_AGENT using 'exaproc_94';