Oracle mechanics

Igor Usoltsev's Oracle blog

Huge sql plans comparision

Posted by Игорь Усольцев on January 27, 2014

original in Russian

Once upon a day we’ve got a little trouble on OEBS (Oracle Apps)  db server:

– while there not exist any valuable I/O related waits, these temporary  difficulties were not catastrophic, and db servers still remain accessible and manageable. But high CPU load / LA reason are typical for Oracle performance problem, and this is interesting topic for the post imho

AWR for the problem period:

WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
OEBS          2598577434 OEBS2               2 05-Oct-13 09:23 11.2.0.3.0  YES

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
db2server.oebs.r Linux x86 64-bit                   16     8       2      94.60

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     58446 08-Oct-13 16:00:08       501      31.8
  End Snap:     58447 08-Oct-13 17:00:13       527      30.5
   Elapsed:               60.08 (mins)
   DB Time:            5,040.45 (mins)                                           -- almost 80 times higher against Elapsed on 8 physical cores

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                           53,883          17.8            -- DB CPU used 900 minuts from 960 = 60 х 16 CPUs available
...

~~~~~~~~         Load Average
               Begin       End     %User   %System      %WIO     %Idle
           --------- --------- --------- --------- --------- ---------
               85.91     89.47      96.2       3.1       0.0       0.0           -- high LA without WIO as a result

SQL ordered by CPU Time               DB/Inst: OEBS/OEBS2  Snaps: 58446-58447    -- high CPU consuming SQL list

    CPU                   CPU per           Elapsed
  Time (s)  Executions    Exec (s) %Total   Time (s)   %CPU    %IO    SQL Id
---------- ------------ ---------- ------ ---------- ------ ------ -------------
  46,964.3            9   5,218.26   87.2  245,583.7   19.1     .1 fppuw3hpvww2d -- mainly interesting query
Module: XXX_COMISSION_FOR_EXCEL
SELECT * FROM XXX_REPO_COMISSIONERS_V X WHERE X.CONTRACT_ID = :B3 AND X.CURRENCY
_CODE = :B2 AND X.SCHET_VAT_TYPE = :B1 ORDER BY TO_NUMBER(SPCFIC_TYPE_ORDER),X.T
RX_NUMBER

  44,905.9            8   5,613.23   83.3  234,735.5   19.1     .0 grmwmf7p3503s -- top level sql
Module: XXX_COMISSION_FOR_MAIL
BEGIN XXX_COMISSION_REP.comiss_reports_mail(:errbuf,:rc,:A0,:A1,:A2,:A3,:A4,:A5,
:A6,:A7); END;

   2,053.2            0        N/A    3.8   10,814.1   19.0     .3 3h4nurpz9mgv4 -- top level sql
Module: XXX_COMISSION_FOR_EXCEL
BEGIN XXX_COMISSION_REP.create_comiss_reps_batch(:errbuf,:rc,:A0,:A1); END;

     635.3           77       8.25    1.2    3,692.5   17.2    2.6 79fbwuzw18and -- no matter
...

– as we can see the main CPU consumer is the query with SQL_ID fppuw3hpvww2d, which was called from procedures grmwmf7p3503s and 3h4nurpz9mgv4:

SQL> select top_level_sql_id, count(*)
  2    from v$active_session_history
  3   where sql_id = 'fppuw3hpvww2d'
  4   group by top_level_sql_id
  5   order by count(*) desc
  6  /

TOP_LEVEL_SQL_ID   COUNT(*)
---------------- ----------
grmwmf7p3503s         92699
3h4nurpz9mgv4         11191

How does this trouble query looks in Library Cache?

SQL> @shared_cu fppuw3hpvww2d

INST    EXECS LAST_ACTIVE_TIME    ELA_PER_EXEC PLAN_HASH_VALUE OPTIMIZER_COST CHILD BIND_SENSE BIND_AWARE SHAREABLE  USE_FEEDBACK_STATS OPTIMIZER_STATS  BIND_EQ_FAILURE  REASON1
---- -------- ------------------- ------------ --------------- -------------- ----- ---------- ---------- ---------- ------------------ ---------------- ---------------- --------------------
   1        1 08.10.2013 17:46:00     16452956      2845227671          84363     0 Y          Y          Y          N                  N                Y                Bind mismatch(33)  |
   1      128 08.10.2013 17:54:23    300770995       287300496          89834     2 Y          Y          Y          N                  N                Y                Bind mismatch(33)  |
   2      116 08.10.2013 17:57:25   8898944796      3564323271         147928     5 Y          N          Y          N                  N                N                                    

– there are three child cursors on 2 nodes, which were generated by Adaptive/Extended Cursor Sharing (ECS) with different execution plans. And the worst (by ELA_PER_EXEC) execution plan was used on 2nd node and produced the described above AWR problems

The quickest solution is to fix “good” plan with PLAN_HASH_VALUE = 2845227671 on 1st node, which child cursor still exists in 1st Shared Pool:

11.2.0.3.OEBS1@ SQL> @create_bsline fppuw3hpvww2d 2845227671 "XXX_COMISSION_FOR_EXCEL"
Baseline SQL_8fc941e591c2e46c SQL_PLAN_8zka1wq8w5t3c171ee7b7 was [re]created
for SQL_ID=fppuw3hpvww2d, SQL_PLAN_HASH=2845227671

PL/SQL procedure successfully completed

, and this begins to improve the situation:

SQL> @shared_cu fppuw3hpvww2d

INST    EXECS LAST_ACTIVE_TIME    ELA_PER_EXEC PLAN_HASH_VALUE OPTIMIZER_COST CHILD BIND_SENSE BIND_AWARE SHAREABLE  USE_FEEDBACK_STATS OPTIMIZER_STATS  BIND_EQ_FAILURE  REASON1                               SQL_PLAN_BASELINE
---- -------- ------------------- ------------ --------------- -------------- ----- ---------- ---------- ---------- ------------------ ---------------- ---------------- ------------------------------------- ------------------------------
   1        1 08.10.2013 17:46:00     16452956      2845227671          84363     0 Y          Y          Y          N                  N                Y                SQL Tune Base Object Different(3)  |
   1        1 08.10.2013 19:04:15      9198046      2845227671          84363     1 Y          N          N          Y                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   1      128 08.10.2013 17:54:23    300770995       287300496          89834     2 Y          Y          Y          N                  N                Y                SQL Tune Base Object Different(3)  |
   1        3 08.10.2013 19:07:42     13421729      2845227671         134524     3 N          N          Y          N                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   2        3 08.10.2013 18:53:45     61386286      2845227671          84363     0 Y          N          N          Y                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   2        5 08.10.2013 18:56:24     84832378      2845227671          84363     2 Y          N          N          Y                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   2       32 08.10.2013 19:08:12     14746924      2845227671         135229     3 N          N          Y          N                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   2      117 08.10.2013 19:08:13  10920026688      3564323271         147928     5 Y          N          Y          N                  N                N                SQL Tune Base Object Different(3)  |                                

until almost the full resolution by some strange way:

SQL> @shared_cu fppuw3hpvww2d

INST    EXECS LAST_ACTIVE_TIME    ELA_PER_EXEC PLAN_HASH_VALUE OPTIMIZER_COST CHILD BIND_SENSE BIND_AWARE SHAREABLE  USE_FEEDBACK_STATS OPTIMIZER_STATS  BIND_EQ_FAILURE  REASON1                               SQL_PLAN_BASELINE
---- -------- ------------------- ------------ --------------- -------------- ----- ---------- ---------- ---------- ------------------ ---------------- ---------------- ------------------------------------- ------------------------------
   1        1 10.10.2013 10:56:38     12313498      2845227671          83497     0 Y          N          N          Y                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   1       99 10.10.2013 13:12:12      6710485      2845227671          86392     1 N          N          Y          N                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   2        1 10.10.2013 10:54:48     15788825      2845227671          83497     0 Y          N          N          Y                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7
   2       48 10.10.2013 13:16:20      8376866      2845227671         205605     1 N          N          Y          N                  N                N                Optimizer mismatch(13)  |             SQL_PLAN_8zka1wq8w5t3c171ee7b7

– with a bit sudden REASON1 and V$SQL_SHARED_CURSOR.USE_FEEDBACK_STATS=Y

The real reason of Cardinality Feedback application for the query with bind variables ( V$SQL.IS_BIND_SENSITIVE = Y ) is the parametrized view usage with typical conditions like:

...
where l.ext_flag = fnd_profile.VALUE ('XXX_PARAM')
...

– and this practice is not rare for the OEBS programming, unfortunately

So far so good, the quick workaround in form of Baseline was found and successfully applied

But my humble practical experience shows that for complicated query with bind variables and parametrized views a such type of solution can not be reliable, at least in 11.2 version

The reliable solutions in these circumstances are:

  • Stored Outline – obsolete but reliable method, with a bit inconvenient management interface
  • SQL Patch – can be applied to query by text, and may be very useful for the limited hints list
  • Query execution plan fixation (in the most problem part, for example) in form of direct optimizer hints, which are visible and quite understandable by developers – if the query text is editable, of course

Keeping in mind, that two last usable variants require the full hints list (or the hints list for specific query block), and the problem query is really complicated (600+ rows in execution plan), I’ve tried to define the most slow query blocks (QB) for the “bad” plan 3564323271 from ASH data:

SQL> @ash_sqlmon_hist fppuw3hpvww2d 3564323271 "" 58440

 ID PLAN_OPERATION                                     OBJECT_OWNER OBJECT_NAME                       COST CARDINALITY  BYTES QBLOCK_NAME  WAIT_PROFILE
--- -------------------------------------------------  ------------ ------------------------------- ------ ----------- ------ ------------ -----------------------------------------------------------------------------------------------------------------------------------------
  0   SELECT STATEMENT                                                                              147928                                 ON CPU(6);
...
533                  VIEW                                           VW_SQ_5                            236          66   1782 SEL$A7C6D689 ON CPU(1);
534                    NESTED LOOPS                                                                                           SEL$A7C6D689 ON CPU(3);
535                      NESTED LOOPS                                                                  236          66   2970              ON CPU(1);
536                        TABLE ACCESS FULL           APPS         XXX_COMISSION_PAYNT_HEADER          63          65   1755 SEL$A7C6D689 ON CPU(139534); latch: cache buffers chains(42); latch free(21); buffer busy waits(12); gc cr block 2-way(1); db file sequential read(1);
537                        INDEX RANGE SCAN            APPS         XXX_COMISSION_PAYNT_LINES_N1         2          17        SEL$A7C6D689 ON CPU(35);
538                      TABLE ACCESS BY INDEX ROWID   APPS         XXX_COMISSION_PAYNT_LINES            3           1     18 SEL$A7C6D689 ON CPU(88); gc cr block 2-way(10); db file sequential read(1); latch: row cache objects(1);
...
634         TABLE ACCESS BY INDEX ROWID ... 

– and we can see QB SEL$A7C6D689, which execution consumes the most part of query elapsed time. For example on plan line 536:

  • ON CPU(139534) – almost 140,000 records in ASH
  • latch: cache buffers chains(42); latch free(21); buffer busy waits(12); gc cr block 2-way(1) – a little concurrency from multiple session execution

Let’s look how the “bad” execution plan 3564323271 differs from “good” plan 2845227671 in query block SEL$A7C6D689?

SQL> @plan_qb_diff_awr fppuw3hpvww2d 3564323271 2845227671 "SEL$A7C6D689"

PLAN_HASH_VALUE QBLOCK_NAME    ID OPERATION                              OBJECT_OWNER OBJECT_NAME                    CARDINALITY      BYTES       COST
--------------- ------------- --- -------------------------------------- ------------ ------------------------------ ----------- ---------- ----------
     3564323271 SEL$A7C6D689  533   VIEW                                              VW_SQ_5                                 66       1782        236
     3564323271 SEL$A7C6D689  534     NESTED LOOPS
     3564323271               535       NESTED LOOPS                                                                          66       2970        236
     3564323271 SEL$A7C6D689  536         TABLE ACCESS FULL              APPS         XXX_COMISSION_PAYNT_HEADER              65       1755         63
     3564323271 SEL$A7C6D689  537         INDEX RANGE SCAN               APPS         XXX_COMISSION_PAYNT_LINES_N1            17                     2
     3564323271 SEL$A7C6D689  538       TABLE ACCESS BY INDEX ROWID      APPS         XXX_COMISSION_PAYNT_LINES                1         18          3

     2845227671 SEL$A7C6D689  608   VIEW                                              VW_SQ_5                                  1         27          5
     2845227671 SEL$A7C6D689  609     SORT UNIQUE                                                                              1         45
     2845227671               610       NESTED LOOPS
     2845227671               611         NESTED LOOPS                                                                         1         45          5
     2845227671 SEL$A7C6D689  612           TABLE ACCESS BY INDEX ROWID  APPS         XXX_COMISSION_PAYNT_LINES                1         18          4
     2845227671 SEL$A7C6D689  613             INDEX RANGE SCAN           APPS         XXX_COMISSION_PAYNT_LINES_N2             1                     3
     2845227671 SEL$A7C6D689  614           INDEX UNIQUE SCAN            APPS         XXX_COMISSION_PAYNT_HEADER_PK1           1                     0
     2845227671 SEL$A7C6D689  615         TABLE ACCESS BY INDEX ROWID    APPS         XXX_COMISSION_PAYNT_HEADER               1         27          1

– the “good” plan advantage well reflected in QB Cost: 5 against 236 – but Oracle have to ignore the CBO calculation for the adaptive technologies applying unfortunately

To get a hints list for QB I’ve used another script PLAN_OL_DIFF_AWR.SQL:

SQL> --                sql_id        bad plan hash good plan hash qb_name
SQL> @plan_ol_diff_awr fppuw3hpvww2d 3564323271    2845227671     SEL$A7C6D689

PLH_3564323271                                                                                       PLH_2845227671
---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------
FULL(@"SEL$A7C6D689" "XH"@"SEL$64")
INDEX(@"SEL$A7C6D689" "XL"@"SEL$64" ("XXX_COMISSION_PAYNT_LINES"."XXX_COMISSION_PAYNT_HEADER_ID"))
LEADING(@"SEL$A7C6D689" "XH"@"SEL$64" "XL"@"SEL$64")
NLJ_BATCHING(@"SEL$A7C6D689" "XL"@"SEL$64")
USE_NL(@"SEL$A7C6D689" "XL"@"SEL$64")
                                                                                                     INDEX(@"SEL$A7C6D689" "XH"@"SEL$64" ("XXX_COMISSION_PAYNT_HEADER"."XXX_COMISSION_PAYNT_HEADER_ID"))
                                                                                                     INDEX_RS_ASC(@"SEL$A7C6D689" "XL"@"SEL$64" ("XXX_COMISSION_PAYNT_LINES"."TRX_NUMBER"))
                                                                                                     LEADING(@"SEL$A7C6D689" "XL"@"SEL$64" "XH"@"SEL$64")
                                                                                                     NLJ_BATCHING(@"SEL$A7C6D689" "XH"@"SEL$64")
                                                                                                     USE_HASH_AGGREGATION(@"SEL$A7C6D689")
                                                                                                     USE_NL(@"SEL$A7C6D689" "XH"@"SEL$64")

– and “good” hints list is almost ready – in column PLH_2845227671, but we have to adjust hints INDEX and INDEX_RS_ASC for usage in query text, and to test the hinted query:

SQL> SELECT
  2  /*+       LEADING(@"SEL$A7C6D689" "XL"@"SEL$64" "XH"@"SEL$64")
  3            USE_HASH_AGGREGATION(@"SEL$A7C6D689")
  4            USE_NL(@"SEL$A7C6D689" "XH"@"SEL$64")
  5            NLJ_BATCHING(@"SEL$A7C6D689" "XH"@"SEL$64")
  6            INDEX(@"SEL$A7C6D689" "XH"@"SEL$64" XXX_COMISSION_PAYNT_HEADER_PK1)
  7            INDEX(@"SEL$A7C6D689" "XL"@"SEL$64" XXX_COMISSION_PAYNT_LINES_N2)
  8  */
  9   * from XXX_REPO_comissioners_v x
 10  WHERE x.contract_id = :v1
 11  AND x.currency_code = :v2
 12  AND x.schet_vat_type = :v3
 13  order by to_number(SPCFIC_TYPE_ORDER),x.trx_number
 14  /

Elapsed: 00:03:29.91 -- not the best, but quite adequate execution time versus 2+ hours of the bad plan time

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor('','',format => '+outline'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  cspksrd1kf196, child number 0
-------------------------------------

Plan hash value: 1117760336 -- was changed because we fixed only the problem block in huge execution plan
...
2133 rows selected.

With the same scripts we can see how exactly used in query text hints appear in last “hinted” execution plan Outline:

SQL> @plan_ol_diff_awr cspksrd1kf196 1117760336 "" SEL$A7C6D689

PLH_1117760336                                                                                       PLH_
---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------
INDEX(@"SEL$A7C6D689" "XH"@"SEL$64" ("XXX_COMISSION_PAYNT_HEADER"."XXX_COMISSION_PAYNT_HEADER_ID"))
INDEX_RS_ASC(@"SEL$A7C6D689" "XL"@"SEL$64" ("XXX_COMISSION_PAYNT_LINES"."TRX_NUMBER"))
LEADING(@"SEL$A7C6D689" "XL"@"SEL$64" "XH"@"SEL$64")
NLJ_BATCHING(@"SEL$A7C6D689" "XH"@"SEL$64")
OUTLINE_LEAF(@"SEL$A7C6D689")
USE_HASH_AGGREGATION(@"SEL$A7C6D689")
USE_NL(@"SEL$A7C6D689" "XH"@"SEL$64")                                                                

and how exactly QB SEL$A7C6D689 was executed:

SQL> @plan_qb_diff_awr cspksrd1kf196 1117760336 "" SEL$A7C6D689

PLAN_HASH_VALUE QBLOCK_NAME    ID OPERATION                                OBJECT_OWNER OBJECT_NAME                    CARDINALITY      BYTES       COST
--------------- ------------- --- ---------------------------------------- ------------ ------------------------------ ----------- ---------- ----------
     1117760336 SEL$A7C6D689  607   VIEW                                                VW_SQ_5                                  1         27          5
     1117760336 SEL$A7C6D689  608     SORT UNIQUE                                                                                1         45
     1117760336               609       FILTER
     1117760336               610         NESTED LOOPS
     1117760336               611           NESTED LOOPS                                                                         1         45          5
     1117760336 SEL$A7C6D689  612             TABLE ACCESS BY INDEX ROWID  APPS         XXX_COMISSION_PAYNT_LINES                1         18          4
     1117760336 SEL$A7C6D689  613               INDEX RANGE SCAN           APPS         XXX_COMISSION_PAYNT_LINES_N2             1                     3
     1117760336 SEL$A7C6D689  614             INDEX UNIQUE SCAN            APPS         XXX_COMISSION_PAYNT_HEADER_PK1           1                     0
     1117760336 SEL$A7C6D689  615           TABLE ACCESS BY INDEX ROWID    APPS         XXX_COMISSION_PAYNT_HEADER               1         27          1

– and this is expected result

Note: used in last test user hint USE_HASH_AGGREGATION was successfully included in plan Outline section above, but this hint produced unusial operation SORT UNIQUE, and this example clearly demonstrates that hints from actual execution plan Outline section are not mandatory for usage , as far as I understand

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: