Dominic Cronin's weblog
From the horse's mouth... a hot content portering tip (and a bit of a rant)
I have to admit, I nearly choked when I saw this. Bear with me a moment, and I'll tell you all about it.
Not so very long back, I was trying to set up the SDL Tridion Content Porter to work as part of an automated "build" system. One of the requirements was that I'd be able to save content and then re-import it into various different publications. After all, you need a go-to-production option as well as being able to support development and test work. After beating my brains out trying to figure out the publication mapping features in Content Porter, I asked around a bit, and found that lots of people have trouble with this, and one well-favoured option is just to go round the problem and run regexes against the entire intermediate file set to swap publication names. Oh-kaay - so a couple of hours later I'd hacked out some javascript that would do this, and solving the mapping problem the official way promptly went on the back burner, perhaps for ever (or at least until the pan boils dry). Moral of the story: doing it the official way is just too stupidly hard.
So on with the tale: This morning I was installing SDL Tridion Webforms on my research image. This was the first time I've installed WebForms, so I was stepping gently through the documentation when I realised that the WebForms installation relies on Content Porter. OK then - a quick excursion to install Content Porter, configure the Business Connector etc., and back to the main plot. That's when I nearly choked. The documentation for the WebForms installation contains this little gem:
When you perform this import, Content Porter creates a Publication called WebForms. Once created, this Publication contains the items that you need in order to use WebForms Designer and WebForms Field Type Editor. Alternatively, you can also import WebForms items into an existing Publication. Note To import WebForms into an existing Publication, rename the Publication to which you want to import the items to WebForms before you run Content Porter. Then, after you have used Content Porter, rename the Publication back to its original name.
So there you have it: according to Tridion, the correct way to solve this problem is to rename the publication, and then rename it back again afterwards. Let's hope that's an OK thing to do in your environment.
To tell the truth, I quite like this as a "hack". It's robust and solid, and very definitely gets the job done. In fact, it's about as nice a job of working around Content Porter's limitations as you'll find. I wish I'd thought of it myself. In fact, part of the reason for this post is as a public service announcement for anyone who doesn't happen to spend their Sunday mornings reading the WebForms installation manual.
But please, Tridion. Isn't this a wake-up call? When your own product installation guides have to give out workarounds like this. I know there's a new version of Content Porter on the roadmap, and I very sincerely hope that it's going to come with batteries included. While I'm on the subject - this is what the WebForms installation guide says a couple of lines further down:
Important:
Due to dependencies between items that you are importing, you will have to run the Content Porter twice in order to import all items used by SDL Tridion WebForms Designer. The first time that you run the Content Porter, you will receive error messages during the import process. These messages are not critical. You can click "Skip All" to continue.
The Content Porter should manage this. If the import of an item fails because its dependencies aren't there yet, and the dependencies are there in the package, then just wait until the end and redo it. Automatically! Rinse and repeat. Why inflict this misery on end users? There's a very real use case for Content Porter where you want to produce a package to give to someone else to import, and you don't want them worried by this kind of nonsense. If there's any reason for the existence of Content Porter, it's the managing of depencies between the items being imported.
End-user misery aside - I should be able to use the Content Porter as part of an automated solution, and that just won't fly while I have to know in advance that a particular package requires two or more attempts to succeed.
/rant
Why is Tridion's configuration library called the TDSXGit?
Those of you who read my previous post will remember that I accessed Tridion's configuration by instantiating a TDSXGit.Configuration object. People who've worked with Tridion for a while may remember that it used to be quite common to edit the configuration in a file called cm_cnfg_git.xml. This file is still there, but without the xml extension, and these days it's encrypted so it doesn't make much sense to try to edit it directly.
To an Englishman like myself, this name TDSXGit is vaguely funny, because "git" in British English is a mild term of abuse. It's not uncommon for me to come out with phrases like: "Which stupid git broke the build"? It's definitely abuse, but fairly mild; you can say it to someone that you like.
But to the point: Back in the R4 days, Tridion's configuration data was kept in the registry, which was all well and good, but had it's own problems. When R5 was designed, there was so much XML around the place that it seemed much more sensible to keep the configuration in an XML file. The problem with this was that all that disk IO would have been a total performance killer. We needed a memory cache. Good idea, you might think, but in a COM-based web application, how do you do that? The design we ended up with makes use of a couple of fairly obscure features of COM. (By the way - I'm not claiming any credit for this, just describing what was designed by other members of the team.)
The idea is to get an object to remain in memory, and to provide a mechanism whereby any code within the application can grab a reference to the object. In COM, a reference to an object is always a pointer to an interface. Memory access in COM is controlled by "apartments" - objects running in one apartment can't directly access objects running in another apartment. In particular, if you have an interface pointer for an object in one apartment, you can't just use that pointer from a different apartment. The interface pointer needs to be "marshalled" across the apartment boundary; in other words, if you should be talking to a proxy that's local to your apartment, you'll get a pointer to that instead. The mechanism for doing this is called the global interface table, hence the acronym GIT.
The GIT is visible from anywhere in the process, and if you register an interface with the GIT, that immediately takes place of the first problem, that of keeping the object in memory. In COM, memory management is done by reference counting. An object keeps track of how many other objects currently have a reference to it, and if that number drops to zero, the object will self-destruct, thereby freeing any memory it was using. As soon as you register an object with the GIT, well the GIT has a reference to it, and therefore it isn't going to self destruct, so you have your memory cache.
When you register an object with the GIT, the API hands you back a "cookie". A cookie in this context is just a number. If you know this number, you can ask the GIT for an interface pointer that references the object. You can keep doing this as many times as you like, unless there's been an explicit call to release the object from the GIT. The interface pointer you get back will work in the apartment you are in.
There's one more thing that you need to make this all work, and that's a way of making sure the cookie is always available when you need to get hold of your memory cache. For this, you can use another obscure COM feature: the shared properties manager (SPM). This just allows you to save a value by name and retrieve it. (The SPM also takes care of a couple of other things, like grouping the properties to prevent name collisions, and locking to control access contentions.)
So when a Tridion process first accesses the configuration, the configuration file will be decrypted and loaded into a DOM object, and the DOM will be registered with the GIT. The cookie is then stored in the SPM. Any subsequent accesses for the life of the process will be simply a matter of grabbing the cookie from the SPM and using it to get the interface pointer from the GIT.
There are other techniques that could be used, but this has the advantage not only of eliminating disk IO, but also repeated parsing of the XML to create the DOM.
This should explain why when you update some configuration value, you have to shut down each of the processes that make use of the configuration. The GIT and SPM are each specific to a process. It is technically possible to get TDSXGit.Configuration to release the DOM from the GIT, but none of Tridion's application code actually does this. That's a reasonable design for a server application that isn't re-configured very often.
In theory (at least according to the theory I just described), it should only be necessary to restart a process if it is affected by the configuration value that just changed, but my own experience flies in the face of this, and I always restart all the processes. It's a bit of a cargo cult thing I suppose, but I'll keep doing it. Actually, I'd love someone to point out where my reasoning is flawed. I hate that cargo cult thing.
"Which stupid git forgot to restart the processes after that configuration change?"
Am I an engineer?
I've been following the recent discussion between Raganwald and Ravi Mohan (among others). It's the classic debate about how you define an engineer. Does the term "engineer" denote a "profession", and who is qualified to be in it? Well first of all, let me be open about my own affiliations. I'm an associate member of the Institution of Mechanical Engineers. The "associate" part, means that I'm not a corporate member, and therefore not a full member of the profession. I stopped doing mechanical engineering, in order to switch to writing software, at just about the point that if I'd have stayed, I'd have registered for full membership. (I had my degree, and a couple of years of professional experience under my belt). To achieve chartered engineer status these days, the obvious route would be to join the British Computer Society, who fulfill the same role as the IMechE but for software engineers.
So let's start there. I am an engineer. I have a good degree in mechanical engineering from a good college. I spent significant parts of my life creating engineering designs on a good-old-fashioned drawing-board, and those designs were implemented and proved in practice. I can do mathematics, both continuous and discrete. (Mechanical engineering tends to use continuous techniques like the differential and integral calculi, while computer science leans towards discrete mathmatics like set theory and so forth.) These days, I implement working computer systems in the best way I know how.
Still - I don't often use the word "engineer" to describe what I do. These days, I write software, and to do so I spend significant amounts of time studying best practices, and honing my art, but most of the time I use the word "technician" to describe my role. Let's be clear. I don't for a moment doubt my right to use the term "engineer" to describe myself. The reason I don't do so is mostly because I wonder whether other people would understand the sense in which I use the term (or perhaps I fear that I'd have to explain at great length).
Let's go back to the beginning of the great professional bodies of the engineering world. My own institution, the IMechE, was an offshoot of the Institition of Civil Engineers, back when the railway guys were carving out new territory, and the existing profession wouldn't recognise them (mostly because of their affiliations with vested interests in the world of canals. But in those days, what made an engineer, and what made an engineering profession? The answer might upset some people's precious sensibilities, but it's simple. Engineers were people who could build working systems, and the early meetings of the profession were probably more akin to a geek dinner than anything else. George Stephenson didn't have a degree, but he could build railway engines.
Following Ravi's link to a job description which illustrates his view of what constitutes an engineer, we find the following quote:
"You have a good sense for distributed systems practice: you can reason about churn and locality in DHTs. You intuitively know when to apply ordered communication and when to use transactions. You can reason about data consistency in a system where hundreds of nodes are geographically distributed. You know why for example autonomy and symmetry are important properties for distributed systems design. You like the elegance of systems based on epidemic techniques."
The only thing of note that I can see in there is the repeated phrase "You can reason about...". (OK - I also like 'you have a good sense for...' and 'you like the elegance of...', but they aren't crucial).
Engineers can reason about their subject. They ensure that they have sufficient raw material at hand for their reasoning, but the reasoning is the thing that counts. If you can reason about technical systems, you are an engineer. If you can't, you aren't. To reason effectively, you must know the subject well, but let's take that as a given. Mathematics is a tool - perhaps we might argue that subjects which don't require mathematics aren't engineering subjects - but it's just a tool, not the essence. I suspect George Stevenson had less mathematics at his disposal than I needed to satisfy the examiners for my degree all those years ago. Never mind that. He could have shown you a working steam engine, and told you how it worked. He managed to found an engineering institution for which I don't even have the entry qualifications.
The beginnings of the engineering profession were a bunch of people saying: "Hey guys, let's meet in a pub every so often and discuss what works and what doesn't." Implicitly, if you were interested in sharpening your tools, you were invited, and if you weren't, you weren't.
I still wonder whether I'll ever make the time to do the BCS exams and work towards recognition as a Chartered Engineer. At the current rate, it might be after I retire. Until then, I'll still keep on calling myself an engineer, and keep on reasoning and sharpening my tools.
The most significant SF and Fantasy books of the last 50 years (or thereabouts)
Picking up the meme-let from Nazgul, I've taken the Science Fiction Book club's list of The Most Significant SF & Fantasy Books of the Last 50 Years, 1953-2002, and filtered out the ones I haven't read.
1. The Lord of the Rings, J.R.R. Tolkien
2. The Foundation Trilogy, Isaac Asimov
3. Dune, Frank Herbert
4. Stranger in a Strange Land, Robert A. Heinlein
5. A Wizard of Earthsea, Ursula K. Le Guin
6. Neuromancer, William Gibson
7. Childhood's End, Arthur C. Clarke
8. Do Androids Dream of Electric Sheep?, Philip K. Dick
10. Fahrenheit 451, Ray Bradbury
13. The Caves of Steel, Isaac Asimov
15. Cities in Flight, James Blish
16. The Colour of Magic, Terry Pratchett
21. Dragonflight, Anne McCaffrey
23. The First Chronicles of Thomas Covenant the Unbeliever, Stephen R. Donaldson
24. The Forever War, Joe Haldeman
26. Harry Potter and the Philosopher's Stone, J.K. Rowling
27. The Hitchhiker's Guide to the Galaxy, Douglas Adams
30. The Left Hand of Darkness, Ursula K. Le Guin
38. Rendezvous with Rama, Arthur C. Clarke
39. Ringworld, Larry Niven
42. Slaughterhouse-5, Kurt Vonnegut
46. Starship Troopers, Robert A. Heinlein
47. Stormbringer, Michael Moorcock
48. The Sword of Shannara, Terry Brooks
To tell the truth, among the remaining items are several that I might have read, but I didn't remember the book as such. That's probably because I haven't actively read science fiction for about 10 years. How you get from being a complete SF&F nut to someone that never reads fiction is another story.
Making rankings like this is always going to be controversial, but I suspect that my own approach would be to choose the authors I wanted to have represented, and then try to figure out for each which was their master work, and decide whether they need to be represented more. So Lord of the Rings represents Tolkien quite adequately, but surely the Silmarilion (which I never read) was only for the people whose appetite wasn't sated by multiple re-reads of LOTR. The Foundation trilogy is a good start, but I, Robot clearly outranks some of the rubbish that's been included. Yes - for prolific core SF authors like Asimov and Clarke you need to have more than one entry. (For Clarke, I'd add the Fountains of Paradise).
For some of the other authors, the book listed is clearly not their master work. Starship Troopers for Heinlein? Surely not. I agree with Le Guin being represented for both F and SF, although for the latter, I'd have said The Disposessed was a better choice. Whatever - Earthsea was her master work.
Why is Harry Potter in there at all? Surely you could toss in a few Harry Harrisons and a few extra Nivens to push him off the bottom of the list. I guess somebody thought "significant" had something to do with box office. In that case, let's have Star Wars.
Maybe one of these days I'll start reading again... but I wouldn't know where to start. I think mostly what stops me is the thought that I won't be able to get sufficient momentum to actually get through a book, let alone give it respect.
PWN'ed at Egmond
The 36th Egmond Half Marathon took place last Sunday (13th January). Thirty Six years is an impressive score; they began long before distance running became popular. Anyway - for some bizarre reason, after finishing the Amsterdam half, I signed up for Egmond, regardless of the fact that the last time I did anything cross-country was at school. The Egmond event begins with 3-4 kilometres through the village, followed by a 7 kilometre stretch along the beach. From there, you come back along peat paths through the dunes until you eventually get onto brick paths that take you back to the village and the finish line.
My official time from the chip was 1:56:53, placing me 3339 out of 6174.
The official timing also gave me a split at the half way point (10.5 km) of 57:46.
From my own watch I took splits at 10 and 15 km, as follows:
10km: 53:29:00
15km: 1:22:47:20
so as you can see, it was slow going. I was pleased just to get round the course in one piece.
The event was sponsored by PWN, a local water company (I think). So anyway - I've been PWN'd. Pure water and Nature. To tell the truth, when you're plodding along that beach, you aren't much thinking about the beauties of nature. At one point in the dunes, I looked up and for want of something positive with which to exercise my mind, I paid particular attention to the aforementioned beauties. You've still got to keep putting one foot in front of the other....
Anyway - as it turns out, the last post to this blog was when I finished the Amsterdam half. Time I started writing some technical posts.
Amsterdam half-marathon - a new personal best
The Amsterdam half-marathon is, like the marathon that's run on the same day, known for fast times - partly because of a generally flat course, and partly because being this late in the year, the chance of the weather being too hot is less. For me, especially when making a comparison with the Great North Run, the fact that it's a comparatively smaller event made a big difference too. For the last two-thirds of the race I was running in open space most of the time. I was therefore able to follow my plan of getting up to a good rate of work early on and sticking to it. Most of the time I had an eye on my heart rate monitor, and kept it in the high 160's. At the end, the average rate was 166. For me that's good steady work. I don't think I could sustain, say 170 over that distance. Something to do with being an old git. That means that if I'm to improve on my times, I'll need to raise my general level of fitness. That sounds like lots of unpleasant speed work - if I decide to try to improve on this result.
According to the official results page my net time was 1:45:21 - they also quote a gross time (1:48:13), which is presumably the time from the starting gun to when you cross the line. In this modern world where the progress of your "chip" is monitored round the course, I'm quite happy to accept the net time: start line to finish line, as the result. Back in the 1994 Great North Run, I did 1:48, and that was the time from my own stopwatch: start line to finish. Today's time is therefore my personal best over this distance. Um - OK - it's only my third half-marathon ever, so talking about personal bests might be a bit precious, but I'm really pleased with this, as my previous PB was set 13 years ago.
The official results also include some rather bizarre split times. This comes about because the positions of the mats were relative to the start line of the marathon, which was different to the start for the half-marathon. (I don't have any timings from my own watch, as I pushed the wrong button at the 5km mark. Thank goodness for the chip.)
8,9 kilometer | 43:05 |
13,9 kilometer | 1:08:27 |
18,9 kilometer | 1:34:28 |
My position was 2279 / 8439. That will do nicely. I mean you can take this all too far. The memorable image of the day was Emmanuel Mutai crossing the line after running the marathon in 2:06:27. He'd given his all, and as he rounded the track in the Olympic Stadium, approaching the finish line, you could see he was having trouble. As he crossed the line, he promptly threw up. That shows just how much these top athletes push themselves. I'll settle for less.
Great North Run completed
Yesterday I took part in the Great North Run. According to the official results page, my time was 1:56:49, which I suppose I'm quite pleased with under the circumstances. It was a warm day, and starting back in the green pen, I spent the entire race hunting for a gap to try and overtake people. It really is running in a football crowd. The organisers have done a lot to help with the consequences of having such a large crowd. If you've got an official time from another race which gives credence to your estimated finish time, then you get to begin closer to the front. (I didn't, so I ended up in the greens, although my estimate was pretty close to the truth as it turned out.) Unfortunately, with 50, 000 entries, the start could do with being yet more staggered. Even after walking for the best part of half an hour to the start line, when you get there you're still walking.
Back in 1994 when I last did the GNR, they didn't have chip timing systems, so the only way to get an accurate time was from your own watch. At least with the chip system, you get an official time that's based on when you cross the start line as well as the finish line.
Anyway - if the organisers want to improve things yet further, here are my suggestions:
- Disqualify people who cheat on their start colour. This could easily be done by putting a chip mat at the front of each colour zone and not giving medals to anyone who didn't cross the mat for their colour. (This also accommodates the rule that allows you to go back a colour but not forward.)
- Make it explicit in the guidance to runners that if you are walking, or otherwise slowing down to a pace which forces others to pass you, please get over to the left and let people past. Some people were even walking three abreast after only a mile or two.
So - as stated above, my official time was 1:56:49, which gave me a position of 9690. (I guess at this point I should emphasise the 40, 000 people who came in after me!)
They were also kind enough to provide an official 10 mile split, which was 1:29:08
While running I took 5km splits. Here are the timings off my own watch:
5km | 27:08 | 27:08 |
10km | 28:49 | 54:57 |
15km | 28:05 | 1:23:02 |
20km | 29:19 | 1:52:21 |
Half marathon |
5:30 | 1:56:51 |
On the 21st October, I'll be doing the Amsterdam half-marathon. That's not really far enough away to get in much extra training, but I still have hopes to come in with a faster time, if only because there'll be less of a crowd. Then again, I alwaystalked a good race. :-)
Specifying a DNS server under Gentoo linux
I realised this afternoon that trackback pings from this server weren't reaching their targets. A short investigation showed that I couldn't resolve domain names. For name resolution to work, you need to have a "nameserver" directive in /etc/resolv.conf , but there wasn't one there. I remembered that I'd solved the same problem a few months ago by adding such a directive. Obviously I hadn't been thorough enough at the time if it was now broken again.
The obvious candidate to blame was rebooting the system. I'd rebooted for some reason a few weeks ago; presumably it had been broken since then. This was indeed the case as it turned out. On a Gentoo system the /etc/resolv.conf file is created by the init scripts. These init scripts use data from /etc/conf.d/net. The problem I had was that I didn't know the correct syntax for a directive in this file which would cause resolv.conf to get a nameserver directive.
After much digging, it turned out that:
- There is no documentation
- There is a file called /etc/conf.d/net.example which contains sufficient detail to allow you to fix the problem
If you add something like this to /etc/conf.d/net :
dns_servers=( "192.168.0.3" )
you'll end up in your /etc/resolv.conf with a line that looks like this:
nameserver 192.168.0.3
Hope this helps
CSS3 doesn't look so bad to me (so far)
By way of my regular dose of Ajaxian, I came across an article by Alex Russel, describing CSS3 as a "giant serving of FAIL". Alex, I'm afraid I'll have to disagree with you.
Firstly though, thanks Alex, for presenting your views in a sufficiently provocative way to get me to go and look at the current standards work. For CSS (or anything that will have to wait for browser support to become useful) I wouldn't usually bother until after a recommendation is out, and it's interesting to see what's coming.
Alex is unhappy that the proposals' focus on CSS namespaces, CSS Print Profile and CSS Advanced Layout seems to have taken priority over some things that he would like to see. He's keen on being able to have "mix-in" styles so that you could import the style from another existing style into your rule. He'd also like to see the ability to define and use variables in a stylesheet, for example for defining named colours.
On the face of it, there's nothing wrong with these suggestions, but to me they are frills, and worse yet - any such frills will reduce the likelihood of coherent and reliable implementations by the various browser vendors. Making sure that all the browsers work properly when multiple classes are applied to an element would solve most of the pain addressed by mix-in styles, and variables are probably more suitable for server work, ideally in a CMS.
Now on to the things he doesn't like in the proposal. He describes CSS Print Profile and CSS namespaces as "turds". Well personally I don't have much of a use-case for Print Profile yet, so I'll sit on the fence and say it's probably good for the people that need it. CSS namespaces turns out not to be a turd at all, but a minor enhancement that will remove various headaches from people who are trying to style non-trivial XML documents.
I've saved the best till last though: Alex describes the CSS Advanced Layout module as a "cluster-fsck". I don't know what one of those is, but if it's anything like a cluster-fuck, I don't get his point. One of the areas of CSS most needing attention is how we do page layouts. The normal Internet web site visitor these days expects to see a couple of navigation areas, along with sidebars, footers etc., etc. Now I'm pretty much a middle-hitter in the world of CSS. I know way more than your typical Frontpage man-on-the-Clapham-omnibus kind of guy, but I'm nobody's guru either. I'll be straight: right now, speaking as a middle-hitter, getting a standard three-column layout is hard. Too hard. The solution as proposed for CSS3 looks like it will make sense to ordinary folks and middle-hitters. I wish Alex hadn't attempted to ridicule this as Ascii-art, because it's worth more serious attention than that. In addition to solving the three-column layout problem, there's all sorts of other goodness, like tabbed layouts, and the potential for things like newspaper layouts that you probably wouldn't have attempted without tables.
I have no friends
So flushed with my success on LinkedIn I joined Facebook.
er.. yeah ... my success on LinkedIn huh? Right - well I've got 136 connections - cool eh? Full of life-changing goodness! Give me a break!
And then on Facebook, they aren't called connections, they're called friends - so I joined, and the first thing it told me was "You have no friends".
What do we do this for? LinkedIn has been completely bloody useless to me, other than giving me another compulsion to feed. To be honest, all it's good for is finding out when former colleagues change jobs. Hey - even if you get escorted off the premises by a security guard, at least you can still spam your contacts eh?
But perhaps enough youthful trendliness will rub off from Facebook on to me to stave off the inevitable mid-life crisis for another year or two. Ain't holding my breath though.