| Executive Summary: Microsoft SQL Server 2008’s filtered indexes and statistics feature lets you improve query performance without wasting disk space, offers more detailed and more accurate distribution statistics and therefore better selectivity estimates from the optimizer, and provides an alternative to the commonly requested unique constraint with multiple NULLs.
|
SQLServer 2008 introduces a very cool new
feature—filtered indexes and statistics.
This new feature lets you improve the performance of
your queries without wasting disk space; offers more
detailed and more accurate distribution statistics,
which provide better selectivity estimates from the optimizer;
and provides an alternative to the commonly
requested unique constraint with multiple NULLs. I’ll
use several examples to introduce you to the concept
of filtered indexes and statistics and to demonstrate
their use.
Filtered Indexes and Statistics
In SQL Server 2008, you can define nonclustered
indexes and statistics on a subset of rows rather than
having to create them on the whole set of rows from a
table. You achieve this task by specifying a predicate in
a WHERE clause as part of the CREATE INDEX or
CREATE STATISTICS statement. For example:
CREATE INDEX idx1 ON dbo.T1(keycol) WHERE col1
= 5;
The SQL Server 2008 implementation lets you specify
fairly simple predicates. You can use IN; the comparison
operators IS NULL, IS NOT NULL, =, <>,
!=, >, >=, !>, <, <=, !<; and the logical operator AND.
We can assume that Microsoft will continue to expand
this list in future releases. Also, for now, filtered indexes
cannot be created on indexed views, or using computed
columns, user data types (UDTs), HIERARCHYID,
or spatial types.
The ability to define filtered indexes provides several
benefits when you frequently query specific subsets
of rows. Compared with creating ordinary indexes, creating
indexes on a specific subset of rows results in savings
beyond the obvious disk space saved. For example,
only changes in the subset of rows in the underlying
table will involve modifying the corresponding index
rows. Index maintenance operations such as index
rebuilds and reorganizations will naturally be faster
and cheaper. Consider also the fact that the maximum
number of steps represented in distribution statistics
(histograms) is limited. Because a filtered index represents
only a subset of the rows from the table, each
step in the histogram represents fewer rows compared
with a nonfiltered index. As a result, filtered indexes
and statistics are more accurate.
A scenario in which you might find filtered indexes
useful is when you need to index only known values,
but the column you want to index has a large number
of NULLs and you never need to filter NULL values.
Typically, every row from the table has a corresponding
row in the index, including when the index key is
NULL. But if those NULLs are not relevant to your
queries, you will benefit from defining the index on the
subset of rows with only the known values in the index
key column. As an example, suppose that you want to
create an index that will support queries against the
Sales.SalesOrderHeader table in the AdventureWorks
database, filtering a known currency rate ID. You
can create an index on the subset of rows that have a
known value in the CurrencyRateID column like so:
USE AdventureWorks;
CREATE NONCLUSTERED INDEX idx_currate_notnull
ON Sales.SalesOrderHeader(CurrencyRateID)
WHERE CurrencyRateID IS NOT NULL;
Here’s an example for a query (call it Query 1) that
filters all rows with a certain known currency rate ID:
SELECT *
FROM Sales.SalesOrderHeader
WHERE CurrencyRateID = 4;
Because the optimizer is smart enough to realize
that this filter is supposed to exclude NULLs, it will
consider using the index idx_currate_notnull. Then,
whether the index is used depends on the filter’s selectivity.
As the execution plan for this query shows (see
Figure 1), the index was used for the query.
When you use a range filter predicate to create a
filtered index, the optimizer considers using the index
even when the query requests a subinterval of the
range. As an example, the following index is defined
on the Sales.SalesOrderHeader table, with the Freight
column as the key, based on the predicate Freight >=
$5000.00:
CREATE NONCLUSTERED INDEX idx_freight_5000_or_
more
ON Sales.SalesOrderHeader(Freight) HERE Freight >= $5000.00;
The following query (call it Query 2) demonstrates a
request for a subset of rows from the table based on
a subinterval of the range defined in the filtered index
idx_freight_5000_or_more:
SELECT *FROM Sales.SalesOrderHeaderWHERE Freight BETWEEN $5500.00 AND $6000.00;
Figure 2 shows the plan for this query. As you can see
in the execution plan, the index idx_freight_5000_or_
more was used.
Filtered indexes can also have an INCLUDE
clause, which is useful when you want the index to
cover certain queries and thus avoid lookups. For
example, the following code creates an index on
the OrderDate column; includes the columns Sales-
OrderID, CustomerID, and TotalDue; and filters only
rows where TerritoryID is equal to 5:
CREATE NONCLUSTERED INDEX idx_territory5_
orderdate
ON Sales.SalesOrderHeader(OrderDate)
INCLUDE(SalesOrderID,
CustomerID, TotalDue)
WHERE TerritoryID = 5;
Consider the following
query (call it Query 3):
SELECT SalesOrderID,
CustomerID, OrderDate, TotalDue
FROM Sales.SalesOrderHeaderWHERE TerritoryID = 5;
Figure 3 shows the execution plan for this query. As
you can see in the plan, the optimizer uses the index
idx_territory5_orderdate to provide complete coverage.
Because the query requests all rows in which the territory
ID is 5, the optimizer chooses a full scan of the
leaf of the index.
Continue to page 2
Prev. page  
[1]
2
next page