More about index

在关系型数据库中优点就是你执行一条查询，数据库返回对应的数据，而无需告诉她(query optimizer)如何执行来获取数据，她的物理路径选择无法保证，除非你可以一些逻辑上给数据库一引导，物理路径（execution plan）上index就是一选择。

index的创建要考虑其成本(costs) 和收益(benefits)二者权衡，尤其是在OLTP型的系统。

创建索引的目的oracle上通常三种原因：
a)确定唯一性,如唯一索引
help Oracle enforce uniqueness.
b)为数据访问提高性能
performance reasons to supply a high-precision access path to important data.
c)确保完整性约束，如外键
help Oracle enforce referential integrity constraints.

理论上我们不需要用index来确保数据的完整准确性，所以可以说创建index只有一个理由：提高性能。例如primary key 可以在表上发生DML时无需lock table，只需检查索引块小范围scan确保数据的无冲突，得到更好的性能和一致性（concurrency）。

index可以提供了更好的performance 和concurrentcy,但是index也是有维护成本的，通常是实时,这样就会带来一些附加问题。借用一个真实案例“向一个表上insert1百万条数据时，发现非常的慢，等待event 是‘db file sequential read’连续读事件一直到结束”.为什么呢？

“The db file sequential read Oracle metric event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return.

This call differs from a scattered read, because a sequential read is reading data into contiguous memory space. A sequential read is usually a single-block read.

Single block I/Os are usually the result of using indexes. Rarely, full table scan calls could get truncated to a single block call due to extent boundaries, or buffers already present in the buffer cache. These waits would also show up as ‘db file sequential read’.”

如果insert 1百分行数据到一个表可能要写25000个表段block(每行大小200byte, block size 8K,每个块上40行)，如果表上有一个索引，索引维护的实时性导致在每insert一条记录时都会有一个random read 去寻找索引叶子块并创建Index entry.在更新完成索引叶块之后又要很快的为下一个随机读分配空间。如果表非常的大，并且有很多索引，那么这个Insert就会成为性能问题。
表（heap table）和索引有个非常重要的区别，表上插入一行如果刚好是个empty space就可毫无顾虑的分配；而index Insert一行要准确的放对位置，因为index entry是有序存放的根据key,去寻找正确的位置代价是昂贵的，更新索引列上的值会更糟，因为要删除索引原位置上的index entry 再insert到新的块上。

当你创建一个完整性约束foreign key时,不像unique constraint和pk oracle会自动创建index，当更新父表的unique key时完整性约束会锁定子表检查子记录上的事务如果没有相应在的子表列上创建索引。

Jonathan Lewis said
”
Rule 1: don’t create “foreign key” indexes unless they are really necessary, or unless they are sufficiently useful that you would have created them any way even if there hadn’t been a referential integrity constraint to protect.

Rule 2: consider adding columns to foreign key indexes to make them useful to high precision queries.
”

为了提高索引的高精度，有时表组合索引上重复率较高的列上使用compress技术也是不错的选择。

下面这个script可以判断索引是否需要重建，参考rows_per_block

column ind_id new_value m_ind_id

 select
         object_id ind_id
 from      user_objects
 where     object_name = 'IDX_STUDENT_ORGID' ;

 break on report skip 1
 compute sum of blocks on report

 select  rows_per_block,
         count(*) blocks
 from (
         select
                /*+     cursor_sharing_exact
                        dynamic_sampling(0)
                        no_monitoring
                        no_expand
                        index_ffs(ICME_STUDENT,IDX_STUDENT_ORGID)
                        noparallel_index(ICME_STUDENT,IDX_STUDENT_ORGID)
                */
                sys_op_lbid( &m_ind_id ,'L',ICME_STUDENT.rowid) as block_id,
                count(*)                              as rows_per_block
         from
                ICME_STUDENT
         --      t1 sample block (100)   
          group by
                sys_op_lbid( &m_ind_id ,'L',ICME_STUDENT.rowid)
 )
 group by rows_per_block
 order by rows_per_block;


take a example:

ROWS_PER_BLOCK     BLOCKS
-------------- ----------
           311          1
           397          2
           398       1087
           399        432
           400         60
           401         12
           402          6
           403          1
           405          1
           407          1
           408          1
           414          1
           421          1
           422       1750
           423        195
           424         48
           425         14
           426          2
           427          3
           428          2
           429          3
           433          1
           444          1
           465          1
           471          1
           478         12
           479          4
           480          1
               ----------
sum                  3644

Conclusion:

创建索引可以方便的给DB带来查询上的性能提升，但是DML时index的维护代价也是昂贵的。合理的创建索引，删除不必要的索引。

references Jonathan Lewis ‘s article