Lucene guru Mike McCandless just released on his blog an impressive piece of work visualizing how Lucene MergePolicy really works through a series of YouTube videos. He feeds Solr with a 10Gb Wikipedia dump and also some random add/delete data source, and then records every single segment written and merged during the whole process.
Mike also introduces a cool new merge policy called TieredMergePolicy (LUCENE-854) which is much smarter and slightly more efficient than the default one. Hope this becomes the new default merge policy in Solr.