Dominic Cronin's weblog
Using Ghostscript to reduce the size of a PDF
I had scanned in a document with the intention of emailing it. (For this I usually use PDFCreator which allows you to aggregate the results of several scans into a single PDF.) On this occasion, I had scanned all four pages of the document before realising that with, my current scanner settings, the resulting document would be about 12MB. So I was faced with the choice of either scanning them all again, or finding a way to reduce the size of the PDF. A quick Google turned up this link, which gave the following command line to use with Ghostscript:
gswin32c -sDEVICE=pdfwrite -dNOPAUSE -dBATCH
-dPDFSETTINGS=/ebook -sOutputFile=C:newFile.pdf C:originalFile.pdf
The reason I had Googled for a Ghostscript solution was that I already knew I had it installed as part of Cygwin. (I always install Cygwin on any Windows machine I need to use regularly - mostly for the SSH client, but I usually do a full install just so that all those useful utilities are just there. After a bit of poking, I realised that instead of typing "gswin32c" I just needed "gs". The rest of the command worked just fine, and I ended up with a PDF of somewhat less than 2MB.
So here's a hat tip to the Ghostscript contributers over the years. Thanks. Isn't free software great?
Extending the boot partition of Windows 2003 Server
In my work I quite commonly have servers on which over time I end up wanting to install more software, or simply upgrading what's there already, but in any case, consuming more disk space than originally envisaged by the person who set up the server. Today I was upgrading a Tridion development image to Tridion 2009 SP1. This was on a VMWare image whose C: drive was maxed out at 16GB. A while ago, I'd filled up the 16GB, and then gone though the rigmarole of trying to make it larger, and failed. At the time, it was more expedient to just add another disk and move some of the data off the C: drive. This approach only gets you so far, and sooner or later, you need a bigger C: drive.
So I'd already got as far as using the VMWare utilities to increase the size of the "physical" disk to 20GB, and then I'd booted the image from a GParted "live CD" .iso. and increased the size of the partition, also to 20GB. The problem was that although in the disk management snap-in you could see the full 20 GB, as far as Windows Explorer was concerned, there were only 16GB (and a pretty full 16GB at that!)
If you've ever been through this, you'll know that it can be about getting the magical incantations just right. The operating systems' own tools won't let you expand the boot partition, or a system partition, or the partition where the page file is or a whole bunch of other strange restrictions. (Yes - strange - even in 2003!) The reason for using gparted in the first place was that you definitely can't do it while Windows is running off the offending partition, and at some time in the past, I'd followed this approach, and it had just worked. Why not now? I don't know.
To cut a long story short, it just sat there staring at me, and wouldn't do what I wanted, when for no really good reason, I booted the system from gparted again, and this time used the "check and repair' utility. It duly checked and repaired, and I rebooted the system normally again, only to see that Windows had now decided that the disk was suspect and wanted to run CHKDSK. I let it run, and hey-presto, when the system came up, Windows Explorer could see the full 20GB. Job's a good'un, eh?
So while I don't have any solid explanation for it all, but in the hope that it helps someone - perhaps myself on some future occasion, I'm adding a blog entry. I don't know whether it was actually something that the gparted check/repair did that fixed the problem. In terms of probabilities, I'm leaning more in the direction that it was CHKDSK that did the actual fixing. If anyone is in the same boat, I'd be interested to know whether just running CHKDSK is sufficient.
XML Namespaces aren't mandatory, and tools shouldn't assume that they are.
In his recent blog posting on XML Namespaces, James Clark questions the universal goodness of namespaces. Of course, there is plenty of goodness there, but he's right to question it. He says the following:
For XML, what is done is done. As far as I can tell, there is zero interest amongst major vendors in cleaning up or simplifying XML. I have only two small suggestions, one for XML language designers and one for XML tool vendors:
-
For XML language designers, think whether it is really necessary to use XML Namespaces. Don’t just mindlessly stick everything in a namespace because everybody else does. Using namespaces is not without cost. There is no inherent virtue in forcing users to stick xmlns=”…” on the document element.
-
For XML vendors, make sure your tool has good support for documents that don’t use namespaces. For example, don’t make the namespace URI be the only way to automatically find a schema for a document
It's the second point that interests me. During a recent Tridion project, there was a requirement to accept data from an external source as an XML document. I wanted to use a Tridion component to store this data, as this would give me the benefits of XML Schema validation, and controlled publishing. The document didn't have a namespace, although it had a schema. In order to get this to work with Tridion, I had to go to the provider of the document, and get them to add a namespace. Tridion wouldn't allow me to create a schema whose target namespace was empty. It seemed a shame that even when hand-editing the schema (so presumably asserting that I knew what I was about) the system wouldn't let me make this choice.
At the time, I just got the other party to make the change, and went back to more important things. Maybe there's some internal constraint in the way Tridion works that prevents them from supporting this, or maybe it's such an edge case that no-one was ever bothered by it. If the former, then I can't think what the problem would be; there's no reason to abuse the namespace to locate the schema. Tridion's quite happy enough to allow several schemas targetting the same namespace, so what's so special about the "no" namespace? In Tridion components, XML attributes (quite correctly) are in no namespace, but as long as the correct schema gets used for validation, so what?
I suspect it's more likely that this just comes under the "edge case" heading, in which case, perhaps they can improve it in a future release.