Executive Summary: Microsoft SQL Server 2008’s filtered indexes and statistics feature lets you improve query performance without wasting disk space, offers more detailed and more accurate distribution statistics and therefore better selectivity estimates from the optimizer, and provides an alternative to the commonly requested unique constraint with multiple NULLs.

SQLServer 2008 introduces a very cool new feature—filtered indexes and statistics. This new feature lets you improve the performance of your queries without wasting disk space; offers more detailed and more accurate distribution statistics, which provide better selectivity estimates from the optimizer; and provides an alternative to the commonly requested unique constraint with multiple NULLs. I’ll use several examples to introduce you to the concept of filtered indexes and statistics and to demonstrate their use.

Filtered Indexes and Statistics
In SQL Server 2008, you can define nonclustered indexes and statistics on a subset of rows rather than having to create them on the whole set of rows from a table. You achieve this task by specifying a predicate in a WHERE clause as part of the CREATE INDEX or CREATE STATISTICS statement. For example:

CREATE INDEX idx1 ON dbo.T1(keycol) WHERE col1 = 5;

The SQL Server 2008 implementation lets you specify fairly simple predicates. You can use IN; the comparison operators IS NULL, IS NOT NULL, =, <>, !=, >, >=, !>, <, <=, !<; and the logical operator AND. We can assume that Microsoft will continue to expand this list in future releases. Also, for now, filtered indexes cannot be created on indexed views, or using computed columns, user data types (UDTs), HIERARCHYID, or spatial types.

The ability to define filtered indexes provides several benefits when you frequently query specific subsets of rows. Compared with creating ordinary indexes, creating indexes on a specific subset of rows results in savings beyond the obvious disk space saved. For example, only changes in the subset of rows in the underlying table will involve modifying the corresponding index rows. Index maintenance operations such as index rebuilds and reorganizations will naturally be faster and cheaper. Consider also the fact that the maximum number of steps represented in distribution statistics (histograms) is limited. Because a filtered index represents only a subset of the rows from the table, each step in the histogram represents fewer rows compared with a nonfiltered index. As a result, filtered indexes and statistics are more accurate.

A scenario in which you might find filtered indexes useful is when you need to index only known values, but the column you want to index has a large number of NULLs and you never need to filter NULL values. Typically, every row from the table has a corresponding row in the index, including when the index key is NULL. But if those NULLs are not relevant to your queries, you will benefit from defining the index on the subset of rows with only the known values in the index key column. As an example, suppose that you want to create an index that will support queries against the Sales.SalesOrderHeader table in the AdventureWorks database, filtering a known currency rate ID. You can create an index on the subset of rows that have a known value in the CurrencyRateID column like so:

USE AdventureWorks;

CREATE NONCLUSTERED INDEX idx_currate_notnull ON Sales.SalesOrderHeader(CurrencyRateID) WHERE CurrencyRateID IS NOT NULL;

Here’s an example for a query (call it Query 1) that filters all rows with a certain known currency rate ID:

SELECT *
FROM Sales.SalesOrderHeader
WHERE CurrencyRateID = 4;

Because the optimizer is smart enough to realize that this filter is supposed to exclude NULLs, it will consider using the index idx_currate_notnull. Then, whether the index is used depends on the filter’s selectivity. As the execution plan for this query shows (see Figure 1), the index was used for the query.

When you use a range filter predicate to create a filtered index, the optimizer considers using the index even when the query requests a subinterval of the range. As an example, the following index is defined on the Sales.SalesOrderHeader table, with the Freight column as the key, based on the predicate Freight >= $5000.00:

CREATE NONCLUSTERED INDEX idx_freight_5000_or_
   more
  ON Sales.SalesOrderHeader(Freight)  HERE Freight >= $5000.00;

The following query (call it Query 2) demonstrates a request for a subset of rows from the table based on a subinterval of the range defined in the filtered index idx_freight_5000_or_more:

SELECT *FROM Sales.SalesOrderHeaderWHERE Freight BETWEEN $5500.00 AND $6000.00;

Figure 2 shows the plan for this query. As you can see in the execution plan, the index idx_freight_5000_or_ more was used.

Filtered indexes can also have an INCLUDE clause, which is useful when you want the index to cover certain queries and thus avoid lookups. For example, the following code creates an index on the OrderDate column; includes the columns Sales- OrderID, CustomerID, and TotalDue; and filters only rows where TerritoryID is equal to 5:

CREATE NONCLUSTERED INDEX idx_territory5_
  orderdate
 ON Sales.SalesOrderHeader(OrderDate)

INCLUDE(SalesOrderID, CustomerID, TotalDue) WHERE TerritoryID = 5;

Consider the following query (call it Query 3):

SELECT SalesOrderID,
CustomerID, OrderDate, TotalDue
FROM Sales.SalesOrderHeaderWHERE TerritoryID = 5;
  

Figure 3 shows the execution plan for this query. As you can see in the plan, the optimizer uses the index idx_territory5_orderdate to provide complete coverage. Because the query requests all rows in which the territory ID is 5, the optimizer chooses a full scan of the leaf of the index.

Continue to page 2

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE