- Get off the hard drive
- DMP until you run out of RAM
- SMP until you run out of processors
Limits of Nastran estimate Procedures
If you follow the official recommendations, there are in fact two ways to estimate the memory needed to not use the hard drive. The first method is to use the 'estimate' program. It will generate output as seen below:
Reading input file "./.bdf"
...
Submit with command line arguments:
memory=532.9mb
Estimated Resource Requirements on a 32-bit system:
Memory: 532.9 MB
Disk: 9125.4 MB
DBALL: 6099.1 MB
SCRATCH: 3049.6 MB
SCR300: 845.3 MB
SMEM: 25.0 MB
Well this looks very helpful, except it's very very wrong. This is the output for a model I know used memory=33GB of memory to run at full speed. Estimate didn't even try to recommend an aggressive smemory setting to reduce the amount of data written to the Scratch and Scr300 files.
The other recommended way to find the optimal memory settings is to run the model until the .f04 file displays UIM (User Information Message) 4157.
*** USER INFORMATION MESSAGE 4157 (DFMSYM)
PARAMETERS FOR PARALLEL SPARSE DECOMPOSITION OF DATA BLOCK SCRATCH ( TYPE=CSP ) FOLLOW
MATRIX SIZE = 1163947 ROWS NUMBER OF NONZEROES = 29408809 TERMS
NUMBER OF ZERO COLUMNS = 0 NUMBER OF ZERO DIAGONAL TERMS = 0
SYSTEM (107) = 32770 REQUESTED PROC. = 2 CPUS
ELIMINATION TREE DEPTH = 8372
CPU TIME ESTIMATE = 915 SEC I/O TIME ESTIMATE = 3 SEC
MINIMUM MEMORY REQUIREMENT = 36169 K WORDS MEMORY AVAILABLE = 2807326 K WORDS
MEMORY REQR'D TO AVOID SPILL = 36432 K WORDS MEMORY USED BY BEND = 36170 K WORDS
EST. INTEGER WORDS IN FACTOR = 143016 K WORDS EST. NONZERO TERMS = 299087 K TERMS
ESTIMATED MAXIMUM FRONT SIZE = 2429 TERMS RANK OF UPDATE = 64
If we examine the .log file for this .f04, we see that mode=i8 was chosen, meaning each word is 8 bytes. Multiplying the Memory Reqr'd to avoid spill, aka the estimate for in-core memory, we find that UIM 4157 estimates that 36.432 megawords x 8 bytes/word=291 Megabytes of memory will be required of the 2.807 Gigawords x 8 bytes/word=22.46 Gigabytes made available to hicore by the difference between the mem and smem memory settings. So what did this model actually use? If we read the Total Memory and Disk Usage Statistics section in the .f04 file, we see the following useful output data:
*** TOTAL MEMORY AND DISK USAGE STATISTICS ***
+---------- SPARSE SOLUTION MODULES -----------+
HIWATER SUB_DMAP DMAP
(WORDS) DAY_TIME NAME MODULE
2185800160 16:44:56 FREQRS 655 FRRD1
...
+------------------------------ DBSET FILES -----------------------------+
FILE ALLOCATED ALLOCATED HIWATER HIWATER I/O TRANSFERRED
(BLOCKS) (GB) (BLOCKS) (GB) (GB)
MASTER 5000 2.44 155 0.076 0.093
DBALL 2000000 976.56 3 0.001 0.004
OBJSCR 5000 0.31 387 0.024 0.032
(MEMFILE 32768 16.00 31291 15.279 0.000)
SCRATCH 2000000 976.56 1 0.000 0.000
SCR300 2000000 976.56 1 0.000 0.000
==============
TOTAL: 0.130
This model actually reached a hiwater of 17.48 GB, which means that UIM 4157 was off by a factor of roughly 60. Therefore I usually find little value in UIM 4157.
A practical procedure for finding optimal memory settings in Nastran
I therefore propose a two step process for finding what Nastran actually needs
- Give Nastran as much memory as your system has
- Find out what Nastran actually used
How much memory you can set on the memory command will be dependent on machine and system. If you have a 32-bit machine or operating system, such as standard Windows XP, then the maximum will be 2 GB. If you are on a server you can set the mode=i8 flag and can use as much memory as is installed in the machine. The following will assume that a user knows how much memory is physically available in a system, the installed memory minus the operating system overhead and that used by other users.
The memory settings in Nastran are:
- memory=XGB, where X is the memory in Gigabytes. The total memory allocated
- smemory=XGB, where X is the smemory in Gigabytes. Smemory is short for Scratch Memory. Remember that the memory available to hicore, or the main solver, is mem-smem. Therefore smem must always be smaller than mem, and increasing smem without increasing mem will reduce the amount of data available to the main solver
- mode=i4 (default) or i8. This sets 4 bytes/word or 8 bytes/word. Set it to 8 bytes word on any 64 bit machine with more than 16GB of memory so that Nastran can take full advantage of the memory in the machine. Note that if a model requires memory<16GB when mode was set to i8, you can take that memory required, divide it by half, and set mode=i4.
- 32-bit machines : memory=2GB smemory=1GB
- 64-bit machine with 8GB of memory: memory=6GB smemory=3GB mode=i4 (Try not to use more than ~80% of system memory)
- 64-bit machine with 8GB to 16GB of memory: memory=8GB smemory=4GB mode=i4
- 64-bit machine with >16GB of memory: memory=(.8xtotal installed memory) smemory=(memory/2) mode=i8
+------------------------------ DBSET FILES -----------------------------+
FILE ALLOCATED ALLOCATED HIWATER HIWATER I/O TRANSFERRED
(BLOCKS) (GB) (BLOCKS) (GB) (GB)
MASTER 5000 2.44 155 0.076 0.093
DBALL 2000000 976.56 3 0.001 0.004
OBJSCR 5000 0.31 387 0.024 0.032
(MEMFILE 32768 16.00 31291 15.279 0.000)
SCRATCH 2000000 976.56 1 0.000 0.000
SCR300 2000000 976.56 1 0.000 0.000
==============
TOTAL: 0.130
For instance, this job required 15.28 of the 16GB allocated to Smemory.
Then check the hiwater. This will tell you how much hicore (memory-smemory) you need.
*** TOTAL MEMORY AND DISK USAGE STATISTICS ***
+---------- SPARSE SOLUTION MODULES -----------+
HIWATER SUB_DMAP DMAP
(WORDS) DAY_TIME NAME MODULE
2185800160 16:44:56 FREQRS 655 FRRD1
This particular job was run with mode=i8, and needs 2,185,800,160 words x 8 bytes/word=17.48GB of hicore. Therefore this job, rerun, would use memory=17.48GB + 15.28GB=32.76GB and smemory=15.28GB, with mode=i8. Note that if hicore>hiwater, that means that not enough memory-smemory was set, and that this amount should be increased if it can.
Setting SMP and DMP
Nastran has a long history of SMP type parallelization. Use either the 'SMP' or 'parallel' keywords to set parallel execution, which will work for nearly any modern installation and will use no more RAM.
DMP can also be used in solutions 101,103,108,110,111, and 112. Check the domainsolver executive control statement for more details on how to divide up the problem. When executing, use the 'DMP' keyword to set the number of domains. If there are other machines on the network that have the same version as Nastran and can accept the rsh command, then they can also be used to run Nastran in DMP. Use the hosts=node1:node2:node3:node4 keyword. More details on how to configure this can be found in the MSC Nastran Installation and Operations guide, 'Runing Distributed Memory Parallel (DMP) Jobs'. Similar to Optistruct, each DMP requires its own memory allocation, so memory required is Memory x DMP.
GPU is a very new development for Nastran, and unfortunately is only supported in SOL 101.
Another great post. Thanks!
ReplyDeleteI tried to come up with a good equation to make this more digestable to us engineers:
X=(MEMORY REQR'D TO AVOID SPILL - SPARSE DECOMP MEMORY USED) + TOTAL MSC NASTRAN MEMORY LIMIT
http://mscnastrannovice.blogspot.com/2012/11/2-easy-steps-to-run-nastran-jobs-faster.html
http://mscnastrannovice.blogspot.com/2012/11/2-more-easy-steps-to-run-nastran-jobs.html
Also, there is a report utility that includes a performance improvement suggesteions: http://www.youtube.com/watch?v=PE-vrZZXDEA