Wednesday, October 10, 2012

The Need for Speed - Part 4 of 7: Performance tuning parameters: Nastran

Previously I've covered how to find the optimal memory and cpu settings for Abaqus and Optistruct.  While the procedure for those FEA solvers may have seemed complicated, MSC Nastran (and for the most part its spinoff Nx Nastran) take this complexity to a whole new level.  The fundamental ideas are the same:
  • Get off the hard drive
  • DMP until you run out of RAM
  • SMP until you run out of processors
While DMP and SMP are no more difficult than Abaqus or Optistruct, getting off the hard drive is much harder because, in my experience, Nastran does not do a good job of telling you what it needs.  Whereas Abaqus and Optistruct will tell you exactly what they need, so the memory settings are easy to set, Nastran will only provide partial guidance, particularly when memory settings exceed 32-bit limits.  Further, Nastran does not dynamically allocate memory, so unlike Abaqus if one overcommits memory Nastrann will still make it unavailable to the computer even if Nastran is not using it.  Not only that, Nastran will typically output memory values in terms of 'words.' A word is a chunk of memory large enough to contain a single number and can be either 4 bytes or 8 bytes large.  This makes reading diagnostic files more tedious as one has to multiply a word value by the bytes/word to find the bytes value. 

Limits of Nastran estimate Procedures


If you follow the official recommendations, there are in fact two ways to estimate the memory needed to not use the hard drive.  The first method is to use the 'estimate' program.  It will generate output as seen below:
Reading input file "./.bdf"
...
 Submit with command line arguments:
   memory=532.9mb


 Estimated Resource Requirements on a 32-bit system:
  Memory:                   532.9 MB
  Disk:                    9125.4 MB
  DBALL:                   6099.1 MB
  SCRATCH:                 3049.6 MB
  SCR300:                   845.3 MB
  SMEM:                      25.0 MB


Well this looks very helpful, except it's very very wrong.  This is the output for a model I know used memory=33GB of memory to run at full speed.  Estimate didn't even try to recommend an aggressive smemory setting to reduce the amount of data written to the Scratch and Scr300 files.

The other recommended way to find the optimal memory settings is to run the model until the .f04 file displays UIM (User Information Message) 4157.

 *** USER INFORMATION MESSAGE 4157 (DFMSYM)
     PARAMETERS FOR PARALLEL SPARSE DECOMPOSITION OF DATA BLOCK SCRATCH ( TYPE=CSP ) FOLLOW
                      MATRIX SIZE =   1163947 ROWS             NUMBER OF NONZEROES =  29408809 TERMS
           NUMBER OF ZERO COLUMNS =         0        NUMBER OF ZERO DIAGONAL TERMS =         0
                     SYSTEM (107) =     32770                      REQUESTED PROC. =         2 CPUS
           ELIMINATION TREE DEPTH =      8372
                CPU TIME ESTIMATE =       915 SEC                I/O TIME ESTIMATE =         3 SEC
       MINIMUM MEMORY REQUIREMENT =     36169 K WORDS             MEMORY AVAILABLE =   2807326 K WORDS
     MEMORY REQR'D TO AVOID SPILL =     36432 K WORDS         MEMORY USED BY BEND  =     36170 K WORDS
     EST. INTEGER WORDS IN FACTOR =    143016 K WORDS           EST. NONZERO TERMS =    299087 K TERMS
     ESTIMATED MAXIMUM FRONT SIZE =      2429 TERMS                 RANK OF UPDATE =        64


If we examine the .log file for this .f04, we see that mode=i8 was chosen, meaning each word is 8 bytes.  Multiplying the Memory Reqr'd to avoid spill, aka the estimate for in-core memory, we find that UIM 4157 estimates that 36.432 megawords x 8 bytes/word=291 Megabytes of memory will be required of the 2.807 Gigawords x 8 bytes/word=22.46 Gigabytes made available to hicore by the difference between the mem and smem memory settings.  So what did this model actually use?  If we read the Total Memory and Disk Usage Statistics section in the .f04 file, we see the following useful output data:

*** TOTAL MEMORY AND DISK USAGE STATISTICS ***

 +---------- SPARSE SOLUTION MODULES -----------+
    HIWATER               SUB_DMAP        DMAP  
    (WORDS)   DAY_TIME      NAME         MODULE 
 2185800160   16:44:56    FREQRS    655  FRRD1  

...


+------------------------------ DBSET FILES -----------------------------+
 FILE      ALLOCATED  ALLOCATED    HIWATER       HIWATER  I/O TRANSFERRED
            (BLOCKS)       (GB)   (BLOCKS)          (GB)             (GB)

 MASTER         5000       2.44        155         0.076            0.093
 DBALL       2000000     976.56          3         0.001            0.004
 OBJSCR         5000       0.31        387         0.024            0.032
(MEMFILE       32768      16.00      31291        15.279            0.000)
 SCRATCH     2000000     976.56          1         0.000            0.000
 SCR300      2000000     976.56          1         0.000            0.000
                                                           ==============
                                                    TOTAL:          0.130

This model actually reached a hiwater of 17.48 GB, which means that UIM 4157 was off by a factor of roughly 60.  Therefore I usually find little value in UIM 4157.

A practical procedure for finding optimal memory settings in Nastran

I therefore propose a two step process for finding what Nastran actually needs
  • Give Nastran as much memory as your system has
  • Find out what Nastran actually used

How much memory you can set on the memory command will be dependent on machine and system.  If you have a 32-bit machine or operating system, such as standard Windows XP, then the maximum will be 2 GB.  If you are on a server you can set the mode=i8 flag and can use as much memory as is installed in the machine.  The following will assume that a user knows how much memory is physically available in a system, the installed memory minus the operating system overhead and that used by other users.

The memory settings in Nastran are:
  • memory=XGB, where X is the memory in Gigabytes.  The total memory allocated
  • smemory=XGB, where X is the smemory in Gigabytes.  Smemory is short for Scratch Memory.  Remember that the memory available to hicore, or the main solver, is mem-smem.  Therefore smem must always be smaller than mem, and increasing smem without increasing mem will reduce the amount of data available to the main solver
  • mode=i4 (default) or i8.  This sets 4 bytes/word or 8 bytes/word.  Set it to 8 bytes word on any 64 bit machine with more than 16GB of memory so that Nastran can take full advantage of the memory in the machine.  Note that if a model requires memory<16GB when mode was set to i8, you can take that memory required, divide it by half, and set mode=i4.
So to begin, set memory to as high as it can be set on the machine, depending on how much can be set on the machine.  Then set smemory to about half of memory, as a large amount of scratch data is typically needed by a Nastran analysis.  Try the following settings, depending on machine.  The smallest machine I'm considering has at least 2GB of physical memory available.
  •  32-bit machines :  memory=2GB smemory=1GB
  • 64-bit machine with 8GB of memory: memory=6GB smemory=3GB mode=i4   (Try not to use more than ~80% of system memory)
  • 64-bit machine with 8GB to 16GB of memory:  memory=8GB smemory=4GB mode=i4
  • 64-bit machine with >16GB of memory: memory=(.8xtotal installed memory) smemory=(memory/2) mode=i8
Run your job until it completes.  Check the total memory and disk usage statistics.  The line MEMFILE will tell you how much smemory you actually need.  If the Hiwater on Scratch and SCR300 is zero then congratulations, your machine had enough memory to run this job without the hard drive slowing it down.

 +------------------------------ DBSET FILES -----------------------------+
 FILE      ALLOCATED  ALLOCATED    HIWATER       HIWATER  I/O TRANSFERRED
            (BLOCKS)       (GB)   (BLOCKS)          (GB)             (GB)

 MASTER         5000       2.44        155         0.076            0.093
 DBALL       2000000     976.56          3         0.001            0.004
 OBJSCR         5000       0.31        387         0.024            0.032
(MEMFILE       32768      16.00      31291        15.279            0.000)
 SCRATCH     2000000     976.56          1         0.000            0.000
 SCR300      2000000     976.56          1         0.000            0.000
                                                           ==============
                                                    TOTAL:          0.130

For instance, this job required 15.28 of the 16GB allocated to Smemory.

Then check the hiwater.  This will tell you how much hicore (memory-smemory) you need.
*** TOTAL MEMORY AND DISK USAGE STATISTICS ***

 +---------- SPARSE SOLUTION MODULES -----------+
    HIWATER               SUB_DMAP        DMAP  
    (WORDS)   DAY_TIME      NAME         MODULE 
 2185800160   16:44:56    FREQRS    655  FRRD1


This particular job was run with mode=i8, and needs 2,185,800,160 words x 8 bytes/word=17.48GB of hicore.  Therefore this job, rerun, would use memory=17.48GB + 15.28GB=32.76GB and smemory=15.28GB, with mode=i8.  Note that if hicore>hiwater, that means that not enough memory-smemory was set, and that this amount should be increased if it can.

Setting SMP and DMP
Nastran has a long history of SMP type parallelization.  Use either the 'SMP' or 'parallel' keywords to set parallel execution, which will work for nearly any modern  installation and will use no more RAM.

DMP can also be used in solutions 101,103,108,110,111, and 112.  Check the domainsolver executive control statement for more details on how to divide up the problem.  When executing, use the 'DMP' keyword to set the number of domains.  If there are other machines on the network that have the same version as Nastran and can accept the rsh command, then they can also be used to run Nastran in DMP.  Use the hosts=node1:node2:node3:node4 keyword.  More details on how to configure this can be found in the MSC Nastran Installation and Operations guide, 'Runing Distributed Memory Parallel (DMP) Jobs'.  Similar to Optistruct, each DMP requires its own memory allocation, so memory required is Memory x DMP.

GPU is a very new development for Nastran, and unfortunately is only supported in SOL 101.

1 comment:

  1. Another great post. Thanks!

    I tried to come up with a good equation to make this more digestable to us engineers:

    X=(MEMORY REQR'D TO AVOID SPILL - SPARSE DECOMP MEMORY USED) + TOTAL MSC NASTRAN MEMORY LIMIT

    http://mscnastrannovice.blogspot.com/2012/11/2-easy-steps-to-run-nastran-jobs-faster.html

    http://mscnastrannovice.blogspot.com/2012/11/2-more-easy-steps-to-run-nastran-jobs.html

    Also, there is a report utility that includes a performance improvement suggesteions: http://www.youtube.com/watch?v=PE-vrZZXDEA

    ReplyDelete