Batching components for the component synchronizer
Although the Tridion power tools are currently undergoing a major overhaul to bring them in line with SDL Tridion 2011 and its Anguilla extensibility framework, many of us will be working on pre-2011 systems for some time to come. This being so, the "old" power tools remain a useful resource. Today I was working on a 2009 system, where we are busy with a data migration. This means using the component synchronizer. We've been using it for a few things, but today we wanted to remove some redundant fields from a schema, and then have that reflected in the data. The schema in question had about 8500 components based on it, and processing the entire list in one go was going to slow the whole team down. (We're doing this on a relatively under-powered image that can run quite slowly sometimes - it's easy to max it out!)
So what to do? The way of using the component synchronizer that I'm most familiar with is simply to select the schema and then process all it's components. However, the synchronizer also has an option to process a specific list of components; you have to paste a comma-separated list of TCM URIs into a text box. So then the question was how to get such lists with a reasonable amount of effort. Powershell to the rescue. Here's the approach:
> $tdse = new-object -com TDS.TDSE
> $alg = $tdse.GetObject("tcm:12-1255-8",1)
> $f = $tdse.CreateListRowFilter()
> $f.SetCondition("ItemType", 16)
> $docAlgItems = [xml]$alg.Info.GetListUsingItems(1,$f)
> $AlgTcms = $docAlgItems.ListUsingItems.Item | %{$_.ID}
> $algTcms[0..999] -join "," | out-file first1000.txt
> notepad .\first1000.txt
As you can see, we start with instantiating TDSE and getting hold of the schema. To protect the innocent, let's pretend that this schema is called Algernon. $alg is the variable representing the schema.
So - after a quick where-used, we do the standard Powershell trick of casting the results to an XmlDocument and reading off the ID attributes. By this time, $AlgTcms contains an array of TCM URIs and it's almost trivial to dump the first thousand into a file by specifying an array range (obviously you can get the second thousand by saying $AlgTcms[1000..1999] and so on) and doing a -join
So instead of maxing out the system for several hours, we were able to schedule the work in reasonable batches through the day.