Skip to main content

Understanding SQL Server Full-Text Indexing


Microsoft SQL Server supports T-SQL, an implementation of ANSI standard SQL. T-SQL is designed to (among other things) search for matches in your data. For example, if you've created a table with a column named Notes you could construct these queries:

SELECT * FROM MyTable WHERE Notes = 'Deliver Tuesday'
SELECT * FROM MyTable WHERE Notes LIKE '%caution%'
But what if you're not looking for an exact match, either to the full text of the column or a part of the column? That's when you need to go beyond the standard SQL predicates and use SQL Server's full-text search capabilities. With full-text searching, you can perform many other types of search:
  • Two words near each other
  • Any word derived from a particular root (for example run, ran, or running)
  • Multiple words with distinct weightings
  • A word or phrase close to the search word or phrase
In this article, I'll show you how to set up and use full-text searching in SQL Server 2000, and give you a sneak peek of the changes that are coming in this area when SQL Server 2005 ships next year.

Full-Text Indexing Architecture


SQL Server uses an interface component, the SQL Server Handler, to communicate with the Microsoft Search Service. The Handler extracts data from SQL Server tables that have been enabled for full-text searching and passes it to the search service for indexing. Another component, the full-text OLE DB provider, gets invoked by SQL Server when you actually perform a full-text search. The provider takes the portion of the search that needs to be satisfied by the full-text index and passes it off to the Search Service for evaluation.You might be a bit surprised to learn that SQL Server doesn't handle its own full-text indexing tasks. Any version of Windows that SQL Server will run on includes an operating system component named the Microsoft Search Service. This service provides indexing and searching capabilities to a variety of applications, including SQL Server, Exchange, and SharePoint.
You need to be aware of one consequence of this architecture: because the full-text indexes are not in your SQL Server database, they can't be backed up from within SQL Server. Instead, you need to backup the disk files created by the Search Service. You'll find these files located under Program Files\Microsoft SQL Server\MSSQL\FTDATA.

Enabling Full-Text Indexing

As you can probably guess, there's a certain amount of overhead involved in passing data back and forth between SQL Server and the Search Service. To speed things up, SQL Server doesn't pass any data to the Search Service unless you explicitly tell it to do so. After all, you might never want to do any full-text searches, in which case it would be silly to spend time indexing your data for them.
To get started, you need to add a full-text catalog to your database. The easiest way to do this is to open SQL Server Enterprise Manager and expand the node for your database to find the Full-Text Catalogs node (if that node isn't present, check to make sure that the Microsoft Search Service is installed on the server). Right-click on the node and select New Full-Text Catalog. SQL Server will prompt you for a name and location for the catalog (and it will supply a default location). Name the catalog anything you like and click OK to create it.
Next you need to tell SQL Server what data to include in the catalog. Again, you can do this in Enterprise Manager. Right-click on a table and select Full-Text Index Table, Define Full-Text Indexing on a Table. This will launch the SQL Server Full-Text Indexing Wizard. You need to make these choices to complete the wizard:
  1. Select a unique index on the table
  2. Select the columns to index. You can optionally specify a language to use for word breaking.
  3. Select the catalog to contain the index, or create a new catalog.
  4. Create a schedule to repopulate the index on a regular basis (this is also optional).
When you finish the wizard, it will create the index for the table. But the index won't have any entries in it yet. Right-click on the table again anfd select Full-Text Index Table, Start Full Population to build the actual index

Performing a Full-Text Search

Now you're ready to actually do some searches. For these examples, I added a full-text index to the ProductName column in the Northwind Products table. Four T-SQL predicates are involved in full-text searching:
  • FREETEXT
  • FREETEXTTABLE
  • CONTAINS
  • CONTAINSTABLE
FREETEXT is the easiest of these to work with; it lets you specify a search term but then tries to look at the meaning rather than the exact term when finding matches. For instance, here's a query usingFREETEXT together with its results:

SELECT ProductName
FROM Products
WHERE FREETEXT (ProductName, 'spread' )

ProductName                              
---------------------------------------- 
Grandma's Boysenberry Spread
Vegie-spread

(2 row(s) affected)
As you can see, FREETEXT finds the word or words you give it anywhere in the search column.FREETEXTTABLE works like FREETEXT except that it returns its results in a Table object.
CONTAINS (and CONTAINSTABLE, which works the same but delivers results in a table) offers a much more complex syntax for using a full-text indexed column:

CONTAINS
    ( { column | * } , '< contains_search_condition >' 
    ) 

< contains_search_condition > ::= 
        { < simple_term > 
        | < prefix_term > 
        | < generation_term > 
        | < proximity_term > 
        | < weighted_term > 
        } 
        | { ( < contains_search_condition > ) 
        { AND | AND NOT | OR } < contains_search_condition > [ ...n ] 
        } 

< simple_term > ::= 
    word | " phrase "

< prefix term > ::= 
    { "word * " | "phrase * " }

< generation_term > ::= 
    FORMSOF ( INFLECTIONAL , < simple_term > [ ,...n ] ) 

< proximity_term > ::= 
    { < simple_term > | < prefix_term > } 
    { { NEAR | ~ } { < simple_term > | < prefix_term > } } [ ...n ] 

< weighted_term > ::= 
    ISABOUT 
        ( { { 
                < simple_term > 
                | < prefix_term > 
                | < generation_term > 
                | < proximity_term > 
                } 
            [ WEIGHT ( weight_value ) ] 
            } [ ,...n ] 
        ) 
For instance, you can search for one word "near" another this way:

SELECT ProductName
FROM Products
WHERE CONTAINS(ProductName, '"laugh*" NEAR lager')

ProductName                              
---------------------------------------- 
Laughing Lumberjack Lager

(1 row(s) affected)
Note the use of "laugh*" to match any word starting with "laugh." You can also supply a weighted list of terms to CONTAINS, and it will prefer matches with a higher weight:

SELECT ProductName
FROM Products
WHERE CONTAINS(ProductName, 'ISABOUT (stout weight (.8), 
   ale weight (.4), lager weight (.2) )' )

ProductName                              
---------------------------------------- 
Laughing Lumberjack Lager
Steeleye Stout
Sasquatch Ale
Outback Lager

(4 row(s) affected)

Looking Forward to SQL Server 2005

SQL Server 2005 features quite a number of changes and improvements in full-text searching:
  • A dedicated indexing service that works directly with SQL Serrver. This speeds up full-text operations and isolates SQL Server from changes to the search service made by other applications.
  • Data definition language (DDL) statements for creating and altering full-text catalogs and indexes.
  • Full-text queries against linked servers.
  • Full-text queries against arbitrary sets of columns (instead of just one column or all columns).
  • Specification of the language to be used for word-breaking in an index.
  • Integrated backup and restore for full-text catalogs.
  • Full-text indexing for XML data.
  • Integration with SQL Profiler and logging of index operations.
If you were interested in full-text searching in SQL Server 2000 but ran into brick walls, take another look when the new version comes out. Microsoft's substantial work in this area means that full-text indexing and searching will be better than ever.

Full-Text to the Rescue

Many SQL Server problems can be solved without ever looking at full-text search. But it comes in very handy in one key scenario: when human beings are supplying search terms from their own head, instead of from a list. You may need to work at providing a good user interface for this facility, but if you have people searching through a large corpus of text, you should definitely consider full-text searching. The end result is likely to be a better application and happier users.
Mike Gunderloy is the author of over 20 books and numerous articles on development topics, and the lead developer for Larkware. Check out his latest book, Coder to Developer from Sybex. When he's not writing code, Mike putters in the garden on his farm in eastern Washington state.

Comments

Popular posts from this blog

Tip/Trick: Fix Common SEO Problems Using the URL Rewrite Extension

Search engine optimization (SEO) is important for any publically facing web-site.  A large % of traffic to sites now comes directly from search engines, and improving your site’s search relevancy will lead to more users visiting your site from search engine queries.  This can directly or indirectly increase the money you make through your site. This blog post covers how you can use the free Microsoft  URL Rewrite Extension  to fix a bunch of common SEO problems that your site might have.  It takes less than 15 minutes (and no code changes) to apply 4 simple  URL Rewrite  rules to your site, and in doing so cause search engines to drive more visitors and traffic to your site.  The techniques below work equally well with both ASP.NET Web Forms and ASP.NET MVC based sites.  They also works with all versions of ASP.NET (and even work with non-ASP.NET content). [In addition to blogging, I am also now using Twitter for quick updates and to sh...

ASP.NET MVC - Set custom IIdentity or IPrincipal

Here's how I do it. I decided to use IPrincipal instead of IIdentity because it means I don't have to implement both IIdentity and IPrincipal. Create the interface interface ICustomPrincipal : IPrincipal { int UserId { get ; set ; } string FirstName { get ; set ; } string LastName { get ; set ; } } CustomPrincipal public class CustomPrincipal : ICustomPrincipal { public IIdentity Identity { get ; private set ; } public bool IsInRole ( string role ) { return false ; } public CustomPrincipal ( string email ) { this . Identity = new GenericIdentity ( email ); } public int UserId { get ; set ; } public string FirstName { get ; set ; } public string LastName { get ; set ; } } CustomPrincipalSerializeModel - for serializing custom information into userdata field in FormsAuthenticationTicket object. public class CustomPrincipalSerializeMode...

How to create a countdown timer in jquery

Create a countdown timer in jQuery First we need to include the jQuery library file to the HTML page to perform this task. To do that we need to understand that what exactly a jQuery library fie is ? JQuery library file is the library of JavaScript, which means this file contains the predefined functions of jQuery. We just need to call these functions to perform the task. jQuery functions reduces the lines of code and makes our task easy. As this jQuery library file contains the javascript functions so we need to call the function within <script> </script> tag. Now after including the file, we need to define a variable which will store that for how long you want the timer on the page(c=60) and now the time you set needs to be changed in hours , minutes and seconds using the code “ var hours = parseInt( time / 3600 ) % ;var minutes = parseInt( time / 60 ) % 60; var seconds = time % 60;” Now we need to put the condition if timer got finished (if (t...