Why does using an index make my query take longer?

Abstract: Why does using an index make my query take longer?

  Why does using an index make my query take longer?

Indexes give you one important feature: selectivity. Selectivity is a measure of how precisely an index can identify a particular record. For example, a primary key contains unique values for each record. No two records can have the same value in the primary key field. Thus the selectivity of a primary key index is 1, since a given primary key value always identifies one and only one record in the table.

At the other end of the spectrum would be an index on a field wherein every record in the table has exactly the same value in that field. In this case, the index doesn't help you find a particular record at all, and the selectivity of the index is equal to the number of records in the table.

Indices are very useful for WHERE clauses, because a very selective index will significantly lower the number of records which must be evaluated.

Let's say you have a table containing 100,000 records. Two of the fields in the table are FieldOne and FieldTwo, and there is an index on FieldOne with a selectivity of 10. That is, for any given index value there are about ten records in the table.

Now take the following WHERE clause:

WHERE FieldOne = 'Buckaroo' AND FieldTwo = 'Banzai'

With no index, InterBase would have to read all 100,000 records and see if each one meets the conditions in the WHERE clause. Using the index, InterBase can immediately locate the ten or so records which match the condition on FieldOne and then test them to see if they also meet the conditions on FieldTwo. Searching 10 records instead of 100,000 is clearly much faster.

Now let's change one of our assumptions. Let's now assume that the selectivity on the FieldOne index is 50,000. In this case using the index on FieldOne could actually make the search slower. Why? Because if Interbase has to scan most or all of the records in a table, it is faster to just scan the records in the storage order, which is the order in which the records are stored on the disk. It's faster because, relative to the speed of program execution, moving the head that reads the hard disk is very slow. Thus, if an index is selective enough that it significantly reduces the number of records that must be scanned, then it's worth spending the time moving the disk read head around to read the records in index order. However if you are going to scan most or all of the records, it's faster to read them in storage order than in index order.

One result of this is that indexes usually slow down ORDER BY clauses. It is faster for InterBase to read the entire result set in storage order and sort them in memory than it is to read the records in index order. However If you only want the first few records from the result set, you're better off using the index.