We are currently working on a project where we are setting up aerial photos or orthophotos in Geoserver. We tried to do this, naively, by adding the original data that we got from our supplier. The original data is 1.4TB. This might not be big for Google, but it is quite big for us, and definitely for our server. The problem wasn’t just the total size of the data, it was also a matter of the composition of it.
The data was delivered as jpeg images accompanied by world files. The images were around 50-80mb each (we used them as a ImageMosaic). The dimensions of the images were 10 000×10 000 pixels. We use our own map client written in Javascript (not OpenLayers in this case) which requests tiles of 256×256 pixels. To cover a page in the client we do around 30 requests. In order for Geoserver to get us a tile it needs to load the full image of 10 000×10 000 px. This is obviously quite silly since we won’t use most of those pixels. Most of us don’t have a screen resolution higher than 1920×1200 (many don’t have that!) so why load all that data?
OK, so what to do? Well, our legacy in house map server stores its tiles in 1500×1500 by default. But it occurred to us that it seems that most (citation needed) of the world use tile sizes in the power of 2, so we went with 2048×2048.
Great, we have a plan, now, how do we transform all our tiles to smaller ones? Also, we might want to create a pyramid if we want to be able to match the scale layers of our map data. The solution is gdal_retile.py - GDAL is an amazing open source project that does a lot of useful stuff, like retiling mosaics and creating pyramids, among plenty of other things.
I did a test on 3.1 GB of data and got 998 MB, a size of thirty percent. So, the original files are kind of bloated.
The retile took 2 hours. A quick calculation tells us that to retile all of the data would take 1000h or approximately 41 days. However, it’s a python script which can’t utilize threading, so, how can we speed it up?
The data was split into 10 areas. If we run 10 instances, the script should be able to utilize different cpu cores. This would reduce the running time by 10, and finish in 4 days.
After starting this and going to FOSS4G, and coming back (it took longer than 4 days), you’re done :)
Now all I have to do is combine all the tile index files somehow. I fired up ogr2ogr to combine them and thought I was all set, but obviously I didn’t think about the paths. When creating the new tiles I was at a different level of the directory structure and now I have to prepend the superfolders to the location column in the dbf files. So far I haven’t figured out how to do this in a good way… I tried to do it in LibreOffice, but I became frustrated when because that just moved the data into dbt files and I don’t even know if Geoserver understands that…
More on the result in a later post.


