Thursday, February 14, 2008

Massive Branches

For medium- to large- sites, customers may have large amounts of content based on the same template. For example, a news site may have hundreds of thousands of news stories or a product site may have tens of thousands of products. From a performance perspective, Sitecore will have no problem handling this volume of content; however, the organization of the content will impact system performance (see here for more information: http://sdn5.sitecore.net/Articles/Administration/Sitecore%20Performance/Storage/Deep%20vs%20Shallow%20Trees.aspx). Additionally, the business user or developer experience may suffer in certain circumstances. Consider the following:

I. Home
  a. News
    i. News Story 1
    ii. News Story 2
    iii. …
    iv. News Story 130,001
    v. News Story 130,002
    vi. …

Expanding the News node in the Content Editor would cause serious performance issues from a user-interface perspective. Sitecore would attempt to return the content structure for 100,000+ content items and the web browser may be pushed to its limits in terms of memory usage or CPU. Further, if the News item were used as the datasource for a multilist or lookup control, business users would face a difficult challenge locating content items from a list of this size.

To address this issue, some kind of organizational structure is required. Items may be organized by:

  • Year, month and date
  • Author
  • Alphabet (i.e. an “A” folder, a “B” folder, etc.)
  • Category and subcategory

Whatever organizational approach you choose, select one that will scale as content volume increases. The above content tree, for example, could be organized as follows:

I. Home
  a. News
    i. 2007
      1. 1
        a. 1
          i) News Story 1
          ii) News Story 2
        b. 2
          i) News Story 1

In this example, news stories are organized by year, month and date. The strength of this approach is that it provides a comprehensible hierarchy that easily scales over time. The challenge of this approach is that WebEdit mode will require customizations in order to add new News Stories. When a business user adds a News Story, programmatic logic will be required to ensure that the news story is created in the appropriate folder or subfolder.

Sometimes no obvious organization scheme presents itself. In these cases, an arbitrary content structure may be required to ensure that the content tree remains usable. For example:

I. Home
  a. News
    i. Folder 1
      1. News Stories 1-50
        i) News Story 1
        ii) …
      2. News Stories 51-100
        i) News Story 51
      3. …
      4. New Stories 2,451-2,500
    ii. Folder 2
      1. News Stories 2,501-2,550
      2. …

This approach provides an organizational structure for otherwise hard-to-structure content and ensures that no single branch of the content tree will overwhelm the user interface. The weakness of this approach is that the non-semantic nature of the folders and subfolders may weaken search engine optimization strategies that rely on URL paths for content categorization. It will also be difficult to locate items within the Sitecore UI itself without using the built-in search tool.

Finally, remember that this issue should only be of concern to developers and power users from a user interface perspective. Business users working with WebEdit mode will never use the Content Editor and, as such, will never be exposed to the content tree per se. If you are considering allowing business users to access the Content Editor, remember the potential confusion this may cause as the content tree becomes more complex.

3 comments:

ALM said...

I am currently breaking down a massive branch into several sub-branches. Any suggestions on how to keep the old URLs working after the fact? E.g., something like domain.com/events/Dec07event.aspx moved to domain.com/events/2007/Dec07event.aspx

It would be unfortunate for clicks intended for the old address to get a 404.

Blogging from SF said...

There are a number of options. A couple I can think of include:

1) Using wildcard nodes.
2) Using a 404 handler.

Whichever approach you use, you should send a 301 or 302 response so that search engines will know to use the new URL.

ALM said...

Here's what you inspired, thanks in large part to a blog post by Lars, though what I ended up with was a little different I guess. I handle everything in the pipeline rather than sending the user to a notfound.aspx.

In Web.config after ItemResolver under httpRequestBegin,
<processor type="EventsRenamer.FixAYear, EventsRenamer" />


then


using System;
using Sitecore.Pipelines.HttpRequest;


namespace EventsRenamer
{

public class FixAYear
{

public void Process(HttpRequestArgs args1)
{

if (Sitecore.Context.Item == null) { //item not found

//get name of requested item, without the path
// WebUtil.GetUrlName(0) doesn't seem to work here - maybe hasn't been instantiated yet?
// so we'll just split the URL on '/' and grab the last element in the array
char[] splitter = { '/' };
string[] reqitempath_ar = args1.ItemPath.Split(splitter);
string reqitemname = reqitempath_ar[reqitempath_ar.Length - 1];

//get current year
int year = DateTime.Today.Year;

//loop through years from today back to 1997;
//see if an item by the name of reqitemname exists in the year
string newitempath = "";
while (Sitecore.Context.Item == null && year >= 1997)
{
newitempath = "/Events/" + year + "/";
Sitecore.Context.Item = Sitecore.Context.Database.GetItem("/sitecore/content/home" + newitempath + reqitemname);
year--;
}

//if found, send a 301 ... sitecore would serve the new page without this based on
//the change in Sitecore.Context.Item, but this is how it's "supposed" to be done
if (Sitecore.Context.Item != null)
{
args1.Context.Response.StatusCode = 301;
args1.Context.Response.Status = "301 Moved Permanently";
args1.Context.Response.AddHeader("Location", "http://" + args1.Context.Request.ServerVariables["HTTP_HOST"] + newitempath + reqitemname + ".aspx");
//args1.Context.Response.Redirect("http://" + newitempath + reqitemname + ".aspx");
}
//else will go to default 404 page. the new Sitecore.Context.Item doesn't appear there, it seems;
//if it did we could reset it to origItem here
}

}
}


}



In action, I used wget to see some header info because I couldn't remember what I should really use for such things,

$ wget http://localhost/Events/some2008event.aspx
--14:18:53-- http://localhost/Events/some2008event.aspx
=> `some2008event.aspx'
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://localhost/Events/2008/some2008event.aspx [following]
--14:18:53-- http://localhost/Events/2008/some2008event.aspx
=> `some2008event.aspx'
Reusing existing connection to localhost:80.
HTTP request sent, awaiting response... 200 OK