Tuesday, March 11, 2008

Content Search

In some projects, developers must address requirements that involve simple searches. For example, imagine a firm that provides trainings at store locations around the local area. Each training has a predictable set of attributes such as Title, Start Date and Instructor. End users can select from a list of classes and view all of the locations where the training is being offered. Effectively, the system performs two searches: the first query returns a list of all classes; the second query returns a list of locations where a specific class is being taught. The system may provide further flexibility, wherein choosing a location lists all of the instructors teaching the class. In this case a third query will be required.



Note that here we are referring to a search of content nodes, not a website search that would typically be addressed by a site search engine. The following analysis will assume that developers are either using XPath or Sitecore Query to perform query operations.

The performance of this feature and the strategy for its implementation may be dramatically different depending on the design of the content tree. The following may be the initial design for the project, based on the logical structure of business units:

I. Home
  a. Locations
    i. Store A
      1. Classes
        a. Class A
        b. Class B
      2. About Us
    ii. Store B
      1. Classes
        a. Class A
        b. Class B
      2. About Us

This intuitive structure may easily map on to the business hierarchy, facilitate the development of navigation elements and simplify the implementation of content markers.

To query the above content structure, however, requires the use of descendants queries that may slow site performance. A Sitecore Query of /sitecore/content/home/locations//*[@@templatename=’class’] would return a result set including all classes; however, the system would have to traverse almost the entire content tree to access the desired results. Similarly, a Sitecore Query of locations that teach the selected class would require an additional descendants search along with further analysis of content items to determine available locations.

The larger the number of locations, the slower the queries will perform. The SDN provides guidance on optimizing the performance of various query scenarios (http://sdn5.sitecore.net/Reference/Using%20Sitecore%20Query.aspx); however, in this case, a reorganization of the content tree may be in order. Consider the following alternative:

I. Global
  a. Classes
    i. Class A
      1. Instance A
      2. Instance B
    ii. Class B
II. Home
  a. Locations
    i. Store A
      1. About Us

Here, the classes are organized under /sitecore/content/Global/Classes. Each class has any number of instances, which include attributes for Start Date, Instructor and Location. With this organizational approach, the queries are simplified and provide significance performance advantages. A query of /sitecore/ content/Global/Classes/* will return a list of classes without requiring a descendants query. Retrieving a list of locations for a particular class, again, avoids descendants queries. The query /sitecore/content/Global/Classes/Class A/* will return all class Instances for a particular Class. This list of Classes will need to be analyzed to generate a list of locations.

Optimizing the content tree for search performance is a broad topic and exceeds the scope of this document. However, developers should keep this consideration in mind when designing the content tree. Other approaches – such as leveraging the Links Database, creating search metadata or using search engine APIs – should also be considered fully when optimizing search-driven features.

No comments: