Updating managed metadata term sets during bulk import processes

Recently, while working on a bulk upload timer job, I encountered an interesting problem when updating managed metadata term sets. The client had requested that we access their in-house application via a web service and import the documents in batches from their network share. The in-house application was treated as the master for all reference metadata and we wanted to reduce the data entry their staff needed to do in order to ensure both the application and SharePoint solution were in sync.

The solution we proposed was to have the timer job performing the upload also add any missing terms that were provided by the web service to the metadata term set.

Initial solution

The first stage of the process was to gather all of the terms provided by the service and update the term store with any missing terms.

We used the following code to do this:

TaxonomySession taxonomySession = new TaxonomySession(currentSite);
TermStore store = taxonomySession.DefaultSiteCollectionTermStore;
Group group = store.Groups.Where(x => x.Name == "CompanyXYZ").SingleOrDefault();
TermSet set = group.TermSets[termSetName];

foreach (string term in terms)
{
    if (!set.Terms.Any(x => x.Labels.Any(y => y.Value == term)))
    {
        set.CreateTerm(term, currentSite.RootWeb.UICulture.LCID);
    }
}

store.CommitAll();

Opening the term store through central admin at this point showed that the new terms had been created successfully. The next stage in the process was to upload the documents, and then set the metadata on the document.

The code to update the list item metadata field is shown below:

TaxonomySession session = new TaxonomySession(currentSite);
TermStore termStore = session.DefaultSiteCollectionTermStore;
Group group = termStore.Groups[groupName];
TermSet termSet = group.TermSets[termSetName];
TermCollection terms = termSet.GetTerms(1000);

TaxonomyField field = (TaxonomyField)item.Fields[fieldId];

foreach (Term term in terms)
{
 	if (term.Labels.Select(x => x.Value).Contains(termValue))
    {
        // Code was never reaching this point.
        field.SetFieldValue(item, term);
        break;
   }
}

When stepping through this code and testing it out I noticed that the code was never setting the field value of the metadata field (a comment has been inserted to highlight this). As we had a logging solution already in place, I added some additional logging to the code and tried again. This time the first batch succeeded, but the second batch failed; I had forgotten to delete the terms from the first attempt and when I installed the new logging code the timer job service was restarted. This led me to believe that somewhere along the way SharePoint was aggressively caching the terms when they were first being retrieved and not refreshing the cache when the TermStore.CommitAll method was being called.

Final solution

After researching this caching I found that the TaxonomySession constructor accepts a second argument which is a boolean that instructs the object to update the cache. It looks like this:

TaxonomySession session = new TaxonomySession(currentSite, true);

Once this constructor was used the code worked as expected, and terms were being added and were able to be retrieved in the same process.

Note: For brevity, the code samples displayed here have had guard clauses removed. Before using this code, we recommend implementing guard clauses to ensure the configuration of the term store, group and term set are all correct.

Stories say it best.

Are you ready to make your workplace awesome? We're keen to hear what you have in mind.

Interested in learning more about the work we do?

Explore our culture and transformation services.