Setting crawl schedules with PowerShell

Crawl schedules are a group of properties of a content source, which tell the search service application when the content source should be crawled.

I usually script most of the configuration of a SharePoint 2010 farm using PowerShell, be it for a lab, development, QA, testing, or production. I say most, because there are components I haven't included in my scripts either because the functionality isn't there, or because I can't find an appropriate set of PowerShell cmdlets. Something I hadn't looked into was setting the search crawl schedules because it's simple enough to configure from Central Administrator. I finally got tired of setting up the schedules manually (or more likely forgetting to set up the schedules) so I dug in to figure it out.

Schedules

Crawl schedules are a group of properties of a content source, which tell the search service application when the content source should be crawled. There are two types of schedules: full and incremental. A full crawl runs through the entire content source adding everything it finds to the index. An incremental crawl will compare the content source to the index to find changes and record these appropriately. Depending on the size of a content source, a full crawl can take hours and be very resource intensive, while an incremental crawl will take less time (though it could be just as resource intensive.) How often you run these depends on your infrastructure, your search requirements, and how often the content changes.

Frequency

You can schedule the crawl by three main levels of granularity: monthly, weekly, or daily. Functionally, this is similar to Outlook's monthly, weekly, and daily recurring appointments.

  • Monthly schedules run once during a specified month. You can configure which months the crawl runs, which day in the month, and at which time to start. Once run, it can be repeated again throughout the day.
  • For weekly schedules, the crawl runs every nth week, where you specify a value for n, and specify the days of the week to run and the start time. A weekly crawl schedule can be repeated throughout the day as well.
  • The daily crawls run every nth day, starting a specified time and can repeat throughout the day.

Example

To see how these frequencies look, let's consider the following highly-realistic example:

Contoso Corporation is a government contractor specializing in the development of specialized equipment for a large government agency. Contoso has an intranet farm (CONTRANET) it uses for employees to collaborate on designs, build a knowledge base for their product development, and store its financial records. Contoso believes in being open with high-level financial information and provides this data to employees in spreadsheets in a site owned by the accounting department. Some employees have the financial data site bookmarked in their browser, but most find the latest or historical data with search. Management wants employees to be able to find the financial information immediately as of the fifteenth day of every month (it's usually posted sometime between the 1st and 14th depending on how much money Contoso made in the previous month). The Business Continuity Team is responsible for managing Contoso's backup infrastructure and wants the search index to be as up to date as possible before running their weekly full backups of the SharePoint farm so they can reduce the effort in performing a restore and ensure they meet their regulatory-compliant SLA should disaster strike. Additionally, employees do not want stale search results — if someone uploaded a new document or updated a wiki page within CONTRANET more than an hour ago it should appear in the search results.

Hank is one of CONTRANET's SharePoint administrators and is also a web developer who wears ironic CSS-themed t-shirts. Hank's been tasked to set up the crawl schedules for CONTRANET. After reviewing the business requirements, he decides he needs three schedules:

  1. Monthly full crawl on the 15th of the month just after midnight to ensure all the financial information is present (the full crawl in CONTRANET only runs for 30 minutes despite the terabytes of data because Contoso invests in only top of the line storage hardware)
  2. Weekly full crawl every Friday night at 10:30 PM to ensure the business continuity team's weekly backups contain an up-to date index
  3. Daily incremental crawl every 30 minutes throughout the day to ensure all changes are in the index as soon as possible so the employees don't have to keep refreshing the search results page

With a bit of tweaking, we can turn Hank's findings into statements that look more like SharePoint crawl schedules:

  1. The full monthly crawl runs on the 15th day of every month starting at 12:01 AM and does not repeat within the day.
  2. The full weekly crawl runs every 1 week on Friday at 10:30 PM and does not repeat within the day.
  3. The incremental daily crawl should be run every 1 day starting at 12:00 AM and repeating every 30 minutes for 1440 minutes.

We have three schedules — two full schedules and one incremental schedule. But this is a problem: a content source can only have one full and one incremental schedule. In order to make this work, Hank needs to create a second content source. He'll create a content source for the accounting site (out of scope for this discussion) and give it the full monthly schedule. (More on this below.)

(Remember how I said this was a highly-realistic example scenario? Obviously the schedules we came up with are ridiculous, but this way I can show you how to configure both a full and incremental schedule, and the daily, weekly, and monthly schedules. You probably won't use a strategy like Hank's in your environment; I'm just being thorough for demonstrative purposes.)

Read The Fine Manual

Before we look at implementing Hank's schedules, let's take a moment to check the documentation.

To add a crawl schedule with PowerShell, you use the Set-SPEnterpriseSearchContentSource cmdlet. If you clicked that link as of the time of this post being published, you may (or may not) be surprised that there is a lot of content there but it doesn't really explain how to use the cmdlet or how to generate the appropriate schedule. They list five ways to run the command but there is no attempt to explain the purpose of the different ways. If you scroll all the way to the bottom of the page you will find an example with a clue as to how the cmdlet works.

Parameters

First, there is the ScheduleType parameter. ScheduleType lets us pick whether the schedule is full or incremental. Easy.

Next, to specify a monthly, weekly, or daily schedule, you use the appropriate MonthlyCrawlSchedule, WeeklyCrawlSchedule, or DailyCrawlSchedule parameter. Simple.

Now it gets complicated. Well, not so much complicated as it is involved.

The CrawlScheduleRunEveryInterval parameter is used for daily and weekly schedules. It's the "Run every # days" and "Run every # weeks" field. It is not used for Monthly schedules.

The CrawlScheduleDaysOfWeek parameter specifies the days of the week for a weekly schedule. Enter the days as a comma separated string of day names. For example, every day of the week is: "Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday" (and you would include the quotes because it's a string).

The CrawlScheduleDaysOfMonth parameter specifies the days of the month (1-31) for a monthly schedule. For multiple days, provide a comma separated string: "1,8,15,22,29"

The CrawlScheduleMonthsOfYear parameter specified the months of the year, enter as comma separated string of month names. The entire year is "January,February,March,April,May,June,July,August,September,October,November,December".

The CrawlScheduleStartDateTime parameter specifies the starting time. Enter this in either 12- or 24-hour formats ("1:00 PM" or "13:00"). If you don't include this, it will default to 12:00 AM (00:00). It's worth noting that in Central Administrator this field is a drop down menu with 24 options – one for every hour in the day. If you desire to start your jobs at a time that isn't the top of the hour, you will need to use an alternative method such as PowerShell.

The CrawlScheduleRepeatInterval parameter enables the "Repeat within the day" option and sets the "every" number of minutes. For example if you want to repeat, specify the interval. If you don't, do not include this parameter.

The CrawlScheduleDuration parameter is the "for" value when repeating within the day (Combined with CrawlSchedulRepeatInterval, you get: Repeat within the day every [CrawlScheduleRepeatInterval] for [CrawlScheduleDuration])

Plugging all that together we get the following three cmdlets for creating Hank's three schedules:

  1. Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -MonthlyCrawlSchedule -CrawlScheduleDaysOfMonth 15 -CrawlScheduleMonthsOfYear "January,February,March,April,May,June,July,August,September,October,November,December" -CrawlScheduleStartDateTime 00:01 -Confirm:$false
  2. Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -WeeklyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleDaysOfWeek "Friday" -CrawlScheduleStartDateTime "10:30 PM"
  3. Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Incremental -DailyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleRepeatInterval 30 -CrawlScheduleRepeatDuration 1440 -Confirm:$false

The -Confirm:$false parameter and value are to suppress the confirmation prompt from Set-SPEnterpriseSearchCrawlContentSource. You would want this if you are automating this configuration.

The dirty part

First load up an elevated PowerShell window and get the Search Service Application instance:

Windows PowerShell
Copyright (C) 2009 Microsoft Corporation. All rights reserved.

PS > Add-PSSnapin Microsoft.SharePoint.PowerShell
PS > Get-SPServiceApplication

DisplayName          TypeName             Id
-----------          --------             --
State Service App... State Service        e82c77a1-7074-4b68-9cf2-cc24b7449b53
Managed Metadata ... Managed Metadata ... b9866b7a-d33e-4b7b-a833-dd5ab4dd9657
Web Analytics Ser... Web Analytics Ser... 635d3e34-98b8-486f-b436-bf379a6d8f0d
Security Token Se... Security Token Se... 7af69ee2-a541-4fc5-930f-845464a6100a
Application Disco... Application Disco... 6e5cceac-394c-4f4c-8a1f-8cfb0f0a9c31
Usage Service App... Usage and Health ... 339834b1-3fbb-439e-9806-88c12361d449
Search Administra... Search Administra... 76743784-caea-42fe-90cc-acf0d49a212e
User Profile Serv... User Profile Serv... 606dcb57-9b79-4ac5-8ddf-1d71ea0b7804
Search Service Ap... Search Service Ap... ba30ed2e-e74a-470e-9e4b-1842ab472519


PS > $searchapp = Get-SPServiceApplication ba30ed2e-e74a-470e-9e4b-1842ab472519

Hank has already gone ahead and created a new content source for the accounting site and there's already the content source for CONTRANET. When he enumerates the content sources in the search application we'll see an array:

PS > $content_sources = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $searchapp
PS > $content_sources

Name             Id   Type                 CrawlState      CrawlCompleted
----             --   ----                 ----------      --------------
Local SharePo... 2    SharePoint           Idle
CONTRANET        7    SharePoint           Idle
Accounting Site  8    SharePoint           Idle


PS > $content_sources[1]

Name             Id   Type                 CrawlState      CrawlCompleted
----             --   ----                 ----------      --------------
CONTRANET        7    SharePoint           Idle


PS > $content_sources[2]

Name             Id   Type                 CrawlState      CrawlCompleted
----             --   ----                 ----------      --------------
Accounting Site  8    SharePoint           Idle

To reduce some confusion, let's assign a new variable to both CONTRANET and Accounting Site content sources:

PS > $content_contranet = $content_sources[1]
PS > $content_contranet

Name             Id   Type                 CrawlState      CrawlCompleted
----             --   ----                 ----------      --------------
CONTRANET        7    SharePoint           Idle


PS > $content_accounting = $content_sources[2]
PS > $content_accounting

Name             Id   Type                 CrawlState      CrawlCompleted
----             --   ----                 ----------      --------------
Accounting Site  8    SharePoint           Idle

Now that we have our content sources, we can set the schedules. The CONTRANET source has a full and incremental schedule while the Accounting Site source has a full schedule:

PS > $content_contranet | Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -WeeklyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleDaysOfWeek "Friday" -CrawlScheduleStartDateTime "10:30 PM"
PS > $content_contranet | Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Incremental -DailyCrawlSchedule -CrawlScheduleRunEveryInterval 1 -CrawlScheduleRepeatInterval 30 -CrawlScheduleRepeatDuration 1440 -Confirm:$false
PS > $content_accounting | Set-SPEnterpriseSearchCrawlContentSource -ScheduleType Full -MonthlyCrawlSchedule -CrawlScheduleDaysOfMonth 15 -CrawlScheduleMonthsOfYear "January,February,March,April,May,June,July,August,September,October,November,December" -CrawlScheduleStartDateTime 00:01 -Confirm:$false

We can now check out the schedules:

PS > $content_contranet.FullCrawlSchedule


WeeksInterval  : 1
DaysOfWeek     : Friday
BeginDay       : 12
BeginMonth     : 8
BeginYear      : 2011
StartHour      : 22
StartMinute    : 30
RepeatDuration : 0
RepeatInterval : 0
Description    : At 10:30 PM every Fri of every week, starting 8/12/2011
NextRunTime    : 8/12/2011 10:30:00 PM



PS > $content_contranet.IncrementalCrawlSchedule


DaysInterval   : 1
BeginDay       : 12
BeginMonth     : 8
BeginYear      : 2011
StartHour      : 0
StartMinute    : 0
RepeatDuration : 1440
RepeatInterval : 30
Description    : Every 30 minute(s) from 12:00 AM for 24 hour(s) every day, starting 8/12/2011
NextRunTime    : 8/12/2011 2:00:00 PM



PS > $content_accounting.fullcrawlschedule


DaysOfMonth    : Day15
MonthsOfYear   : AllMonths
BeginDay       : 12
BeginMonth     : 8
BeginYear      : 2011
StartHour      : 0
StartMinute    : 1
RepeatDuration : 0
RepeatInterval : 0
Description    : At 12:01 AM on day 15 of every month, starting 8/12/2011
NextRunTime    : 8/15/2011 12:01:00 AM



PS > $content_accounting.IncrementalCrawlSchedule

(There was nothing returned for the accounting incremental schedule since we did not create an incremental schedule.)

Since the schedules look good, let's kick off a full crawl for good measure:

PS > $content_contranet.StartFullCrawl()
PS > $content_accounting.StartFullCrawl()

And that's it.

References

Stories say it best.

Are you ready to make your workplace awesome? We're keen to hear what you have in mind.

Interested in learning more about the work we do?

Explore our culture and transformation services.