Friday, September 6, 2013

More Memory.... Speed?

My typical advice for most people using an FEA solver is to, first things first, get off the hard drive.   Most of the time the best way to do this is to find a server which supports a LOT of memory, so almost any job is guaranteed to get off the hard drive.  Unfortunately, I've come to find that with many servers, installing more than a certain amount of memory will cause the memory to run at a slower speed.  A little background:

Mutli-core servers typically have a certain number of memory "channels" allocated to each core to allow them to access the memory installed in a system.  Higher performance servers will reach up to four channels, so a high performance server with two sockets to install two cores into will likely have 2x4=8 channels that access the memory installed in the system.  However, memory with sane prices typically comes in no more than 16GB per dimm (an individual memory board).  Thus this two core system with only one dimm per channel (DPC) would only have 2x4x16GB=128GB of memory available!  Fortunately, it's extremely rare to find a machine that doesn't run the memory at the highest speed currently available (1600 MT/s or 800 mhz) when there's only one dimm per channel.  If each of these cores where occupied by 8 core processers, such as a Xeon E5-2690, and if the typical analysis jobs of an analyst used less than 128GB of total memory even after splitting up the job 2x8=16 ways at the DMP level, then this machine would be ideal for that analysts needs.

It's very common, however, to allow more than one dimm per channel.  Two dimms per channel is fairly common (for 2x4x2=16 memory slots for 16 DIMMS), and 3 is not unheard of.  Unfortunately, in my recent shopping experience, more than two dimms per channel will typically cause the memory controllers to restrict the memory speed to no more than 1066 MT/s, which can be a 15-30% performance hit.  So if one had purchased a dual core system with a maximum of three dimms per channel, it could probably be configured with 2x4x2x16=256GBs of full speed 1600 MT/s memory, or 2x4x3x16=384GBs of lower speed 1066 MT/s.

So what to do?  It depends on whether a typical analysis job fits into the high speed memory size before DMP'ing, and an analysts' budget.  If the job is bigger than the high speed memory size even before DMPing, then it would be worth it to take the speed hit on the memory to avoid taking the speed hit on having to use out-of-core scratch memory.  If the job DMPs well, and budget is no issue, then a machine should be configured to the maximum size that runs at full speed, and the more machines should be purchased.  For a crazy person on an in-between budget, just run the machine with 1/3rd of the memory uninstalled and ready to go if your models get a little too detailed.