Monday, December 3, 2012

More on DMP vs SMP

To paraphrase one of my professors at  UC Davis, once you understand a problem well enough to formulate an elegant question about what it is you don't understand, you're halfway to figuring it out.  It was with this in mind that I was contacted by one of my readers with some questions about some of the differences between SMP and DMP.  Paraphrasing and obfuscating a bit, our conversation went something like this:

Reader

Could you explain a bit more about the difference between DMP and SMP?  I understand your card sorting analogy, but what does that mean for hardware?  Multi-cores?  Dual CPUS?  Both?

AppliedFEA

The answer depends a bit on your solver and solution type. The executive summary is that SMP makes most things much faster, but DMP makes some things much much faster, so long as you have enough memory to spare. What solver (Abaqus/Nastran/optistruct) And solution type (statics/frequency/transient etc) are you targeting? Depending on the answer to the previous questions DMP may not be available at all, such as if you're running Nastran SOL 106.

Reader

We run a lot of static stress analysis and modes.  We use implicit solvers such as ABAQUS/Standard, Optistruct, or MSC/MARC.  We run some ABAQUS/Explicit.

AppliedFEA


To break outside the card analogy, an FEA solution requires the solution of a sparse stiffness matrix. Depending on the solution type, there are several levels at which the solution can be broken up and solved on more than one processor at a time, given that the system solving the problem has more than one CPU core. At the lowest level, SMP/parallel, only the vector operations are parallelized. Essentially every solution type in Nastran and Optistruct can make use of this low level parallelization.

You're in luck, however, as some of your solution types have the ability to be parallelized at a higher level.
For statics:
In Nastran, the domainsolver card allows a statics (sol 101) job to split with the stat method, which can assign each processor a certain number of grids or degrees of freedom. Abaqus does it's DMP split similarly, albeit solely at the element level.
For modes (eigensolver):
In Nastran SOL 103 you can use the MODES dmp method, which can split the job up at the grids, DOF, or frequency level (ie each process finds modes in a portion of the total frequency range). Abaqus can, again, split things up over the Element level.

If you're finding modes of models with large numbers of degrees of freedom, consider using an an automated multi level substructuring eigen solver instead of the more typical lanczos. Both Optistruct and MSC Nastran can use the university of Texas developed AMLS eigen solver, which can be purchased separately. All three solvers have also developed their own similar solvers, in Optistruct they call it AMSES, in Nastran they call it ACMS, and in Abaqus AMS. Speed increases of 10x or more are not unknown.

So to tie back to my typical advice, just make sure you are off the hard drive, DMP (split the level up at a high level, with each split requiring more memory) until you run out of memory or CPUs, and if there are any CPUs left after you run out of memory, use parallel/SMP to use those to split up the split at a low level.

Friday, November 30, 2012

Friday Night Music: Normal Modes of an acoustic enclosure

At about the 1:10 mark in the below video, a low frequency waveform begins which is very very close to first resonant frequency of the passenger compartment of my first generation Mazda 3 with the windows rolled down.



Needless to say, the effect was very obnoxious, and enjoyable.

One of the characteristics of sound reproduction is that the combination of loudspeakers and acoustic cavities, such as a pair of headphones or an amphitheater, is that there will be a number of frequencies with much higher responses than others.  A sound engineer attempting to accurately reproduce recorded sound will try and counteract these effects, but this process is very time consuming and difficult to accomplish for either every seat in an auditorium or every pair of headphones that could be plugged into an mp3 player.  As a result, much of what a typical person hears in their lifetime is colored by the speakers and headphones that they happen to be listening to.  It's not implausible to think that this has a significant effect on the music that we gravitate to, and could go a long way to explaining why a song you enjoy very much on your headphones your friends might not.  You may have just found a song with particularly nice resonances for the speakers in your car.

One of the unique joys of live music, particularly live vocal performances, is that singers and musicians will naturally tend to gravitate towards the resonant frequencies of the spaces they're performing in, thus adapting their songs to the place they happen to be in.  Even as a fan of EDM, I can see plenty of work ahead for the genre as it tries to use technology to create a more unique performance at each appearance.

Wednesday, October 31, 2012

The need for speed - Part 7 of 7: Optimizing model detail and solution type

This entry will be less specific than other entries, as the number of techniques to reduce the amount of work required of an FEA solver are extraordinarily numerous. 

There are roughly three ways to change a model such that it runs faster
  • reduce the detail level of the model
  • make use of modeling abstractions
  • use aggressive techniques to arrive more quickly at nearly the correct solution

Reduce Detail Level

The typical guidance for how much detail a model needs is to run a convergence study, increasing detail until the result that an analyst cares about is no longer changing as more detail is introduced.  I generally agree, although a smart analyst should still build up a feel for about how much is necessary for certain features that are commonly modeled in their work.  A few other tips:
  • If an analyst is comfortable working with second order elements, they will typically achieve more accurate results with less run time than simply adding more elements
  • If only a portion of the model is of interest, then Abaqus Surface-Surface ties can allow for a faster transition to the less detailed global model

Many situations do not converge to a solution, and will continue to change their result no matter how fine the level of detail, as the elasticity solution is undefined at a point of interest.  These situations are:
  • A perfectly sharp corner
  • A point in space that can see three material models, with empty space counting as one material model.  These are commonly encountered in soldered joints, composite layups, and other situations where bonding unites two dissimilar materials.  For these situations correlation studies using physical test specimens are usually used, and they will indicate the level of detail one should use in a corresponding FEA model
  • A point constraint or point load
Frequency based solutions should aim to have at least 4 elements per half wavelength for standing waves on structure or in air.  The Abaqus documentation provides good guidance for how to do this for acoustic FEA

Modeling Abstractions

Superelements
If a portion of the model is fixed and will not vary throughout the design process, you can create a superelement (or DMIG as they are sometimes referred to in Nastran) to find the structural response at the interface to that portion of the model
   
Submodeling
If the portion of interest in an Abaqus model is much smaller than the global model which determines the loads that go into it, and design changes in a submodel will not effect the global model, a submodel can provide insight into the behavior of the small portion of the model without having to compute the global solution

Modal Techniques
If the modal density of a model is not too high, modal transients or dynamics can save significant run time. This is particularly with the advanced eigensolvers that are becoming more common in solvers such as Abaqus, Nastran, and Optistruct.  Note that modal density refers to how many modes lie within a frequency range of interest, relative to how many degrees of freedom there are in the model. 

Aggressive Techniques

Some of these are a little wild and could lead to inaccurate results.  Be careful.
  • Use the unsymmetric solver in Abaqus when dealing with models which have strong nonlinear behavior.  Certain nonlinearities absolutely require it, others just converge faster with an unsymmetric solver, with fewer steps, even though each step will require slightly more work
  • Use an iterative solver for large compact models, such as engine blocks
  • Reduce convergence criteria for nonlinear analyses.  If a model is nearly converged but not quite you can just call it good enough and move on with these tricks
  • For models where a body is positioned only by contact with friction, try lightly constraining the body with a spring or constraint while converging the contact solution, then removing the constraint at the last step.  It will change the final result, as contact is path dependent
  • Use variable mass scaling in Abaqus explicit to set a minimum stable time increment when there are a few small elements controlling the minimum stable time increment.  The size of the scaling and the number of small elements may effect the accuracy of this technique

Wednesday, October 24, 2012

The need for speed: Part 6 of 7 - Reducing output requests

In order to find a solution to a structural FEA model, an FEA solver must find the displacements that result from the applied loads and displacements.  What an analyst does with this solution for the model, which is now held in memory,will affect how much extra effort the FEA solver must put into finding other solution variables, and to saving all the requested output to disk.  The typical guidance is that excess output requests is a bigger impact on hard drive space than on solution times, but certain solution sequences can generate output so frequently that it can severely slow the solution speed, such as explicit, which will typically have time steps so small (maybe a millionth of a second) that even for short duration events an analyst may only care about the solution result every thousandth step. 

There are two types of output from an FEA solution: primary field variables and calculated variables.  Primary field variables are the actual solution, such as displacements for a structural simulation or temperatures in a heat transfer simulation.  As such, when they are requested they only require writing data to the disk and have a small impact on simulation speed.  The other type of variable, calculated variables, are found by using the solution variables and the properties of elements to back calculate other variables, such as stress and strain in a structural simulation or heat flux in a thermal simulation.  These require actual computation, and as such will have a much bigger impact on simulation speed.

Some practical tips for particular solvers:

Abaqus
Abaqus will only output the output variables that you request.  The most important thing in Abaqus is typically to reduce the frequency of output in a nonlinear analysis and explicit dynamic analysis.  Both of these solutions will find a converged result over and over again, and unless a model is new and being diagnosed these intermediate results typically do not need to be output very frequently.  Make use of the '*output, TIME INTERVAL=' or '*output, Frequency=' keywords to limit the frequency of these outputs.
  
There are two typical types of output in Abaqus: Field and History.  Field is usually used less frequently to output the complete solution, and history is more typicall used for global measures of model condition, such as total kinetic or elastic energy.

While using "*output, variable=preselect" will provide the most typically used variables in most cases, being more specific about which variables, and even which elements or nodes you wish to find solutions at, will provide even more savings

Do not use *restart unless you are fairly certian you need to use, as it will generate an ENORMOUS restart file.

Optistruct
Optistruct has an interesting behavior in that if no output is requested, it will generate the default outputs for you, similar to Abaqus' preselect.  Outside of not requesting things you dont' care about, the most important thing to do in Optistruct is to be careful how frequently a full result from an optimization is output.  This can be set on the card, 'OUTPUT,(OUTPUT TYPE),(FREQUENCY)'.  The best result is usually FL for the frequency, as this will generate the full results for the initial and final design.


Nastran
Similar to Optistruct, take care to request only the results you care about, using both selective choices of output requests in the case control and use of sets to only requests results for regions you care about.

Thursday, October 18, 2012

The Need for Speed - Part 5 of 7: A faster computer: performance metrics

So we've covered how to make the most of the particular hardware we might have.  But what if we were to start fresh and try to decide on what hardware would make the most of our software licenses?  I'll try and cover three typical scenarios that seem to come up and how to make the best of them.

I work at a company with Millions and Billions of Dollars in Revenue and purchasing managers who are either very reasonable and accommodating, or maybe even drunk.  What sort of hardware should I purchase for my department?
So there's a few pieces of good news here.

  1. Your purchasing managers are in fact very reasonable.  FEA software licenses are very expensive, running into the high 5 figures per year for enough to service a small department.  Compute servers capable of fully exploiting these licenses, on the other hand, are typically in the low 5 figures every few years.   Spending the same amount on software and half as much on hardware could easily halve your analysis throughput for a very small dollar savings.
  2. There are fewer options at the high end, which makes decisions about underlying architectures easier.
So on to specifics.  The basic decisions are:

  • Hardware Vendor
  • Host Operating System
  • Processor architecture
  • GPUs
  • Memory type and quantity
  • Hard drive types and RAID type
  • Network Interconnects

Hardware Vendor: If you're the sort of company that doesn't build its own multi-acre server farms, just purchase from a company you've heard of before, one that offers long term support contracts.

Host Operating System: Some people will fight the urge to use Red Hat Enterprise Linux 64-bit.  Don't.  Use Red Hat like everyone else.  Spend your time configuring the load monitoring and dispatch software.

Processor Architecture: The most important things are single core floating-point performance, number of cores, and memory throughput.  AMD's Bulldozer architecture and Itanium are out, as both have lost sight of floating point performance as their main performance metric in recent versions.  Nehalem/Westmere are also out, as their memory bandwidth is somewhat lacking at the higher end of CPU count.  This leaves Sandy/Ivy Bridge.  Buy these.  By all means buy the newest and fastest, as many as will fit in one server board.  They'll even come with Intel's Advanced Vector Extensions, which is better supported by Nastran than GPUs.

GPUS: If you use Abaqus, maybe.  If you primarily use Explicit or the unsymmetric solver, then it will be of no benefit.  Otherwise, go ahead.

Memory Type and Quantity: RDIMMS, until no more can be installed.  300GB, 400GB per server.  Special high speed memory modules seem to be losing interest in the market.

Hard drive types and RAID type:  The host operating system doesn't even really needed to be raided for speed.  Long term storage should be offloaded to whatever your organization typically uses.  The speed and capacity of the high speed RAID array will matter more if the output is large, less if it is not.  Order something that seems large enough and that is standard for your vendor.  In contrast to years past, it just won't matter that much.  SSD, 15K RPM server drive, it's not as big a deal as it once was.

Network Interconnects: Consider Infiniband if you do a large amount of Abaqus Explicit or very large/large number of iterations nonlinear Abaqus/Standard models.  Otherwise Gbit Ethernet should suffice. Some Nastran DMP solutions won't see any difference at all, due to how the domains can be seperated.

So spend away!  And laugh in two years at what an outdated piece of junk you can't believe is still around!

I work in a lab at a University and we just received a grant for THOUSANDS OF DOLLARS of server hardware. 

The advice is fairly similar to above.  The most important thing to do is to fill up one server with memory before buying the next, and if you can only partially fill one, stop there.  Whereas tens of GB  is oftentimes sufficient for CFD cluster nodes, hundreds of GB is more desirable for FEA work.  You may very well find yourself with a single server with only half as much memory as the maximum installed, when you could have had half a a dozen boxes on a shelf with less memory.  Don't worry about it, when your models are big there is absolutely no substitute for a big enough chunk of RAM.  And when they're small, several processes on one machine will have much less trouble talking to each other than over a network.

So in short, follow the below list in order until you run out of money:
  • Buy a dual socket Sandy Bridge server motherboard, case, power supply, and at least one reasonably large hard drive.  Consider whether you could use a GPU.
  • Buy a fast 8-core processor for one of the sockets
  • Buy RDIMMS until you've filled half the memory
  • Maybe buy a GPU?
  • Buy a second 8 core processor
  • Buy more RDIMMS until the memory is full
  • Maybe buy another GPU? 
  • Maybe RAID the hard drive?
  • Buy an identical Server and a Gigabit switch.
  • Buy Two more servers
  • With four servers, start to consider Infiniband, depending on FEA software
That should fairly easily cover the range from a few thousand to 100K or so.

I work for a boss whose bonus is determined by his departments' IT spending.  Our computers are so old we're getting calls from a computer museum in California.


You have two options:

Option 1: Coffee and Crashed Drives
Anything more than two years on any magnetic hard drive is borrowed time.  This time can be significantly reduced if one were to, say, run jobs with far too little memory allotted to them, whilst kicking the case in an attempt to incite a head crash.  When the inevitable happens, tell your boss tales of how if you'd only have enough memory the hard drive wouldn't wear out, plus your computer could run everyone else's jobs and not have to be replaced for a long time.  Your coworkers will now be beholden to you to run their jobs for them, and while they're running you can tell lies about how much it slows down your computer, which is why you're going to get more coffee.

Option 2: Take up a collection (of RAM)
There's a fair chance that some computers in the office are faster than others, and that the computer with the fastest processor has room for more RAM to be installed.  Take as much as can be spared from all the other computers, and load up The Chosen One.  If you're lucky your computer cases will be unlocked and you can actually get at the hardware, and the Chosen One has an X86-64 processor, which has been available since 2007, and have a motherboard that can have more than 4GB of ram installed.  If you're very lucky you'll have a BIOS that will let you boot from a CD-ROM or USB drive, so you can install a free 64-bit Linux operating system alongside Windows XP 32-bit.  If you're very clever you'll then run the old Windows XP inside a virtual machine so a boss passing by will be none the wiser.

And if you get caught, it's always better to ask forgiveness than permission.

Friday, October 12, 2012

Friday Night Podcast - Sounds of the Artificial World

I occassionally listen to the 99% Invisible podcast.is It covers a number of interesting topics, mostly focused on civil engineering, but one of the earliest ones focused on something different: the sounds that create the feel of computer programs.


Designing the interface of a computer program so that it creates an intuitive, physical connection with users is very difficult.  Unlike old stereo equipment, a computer program cannot give real tactile feedback when a dial has been turned all the way or when a switch has been turned on or off.  A display can do a very slick job of trying to appear real, or skeuomorphic, but it is still a picture under glass.  Most cell phones and video games can give some shaking to create crude feedback, which can tell you that something happened, but it can't tell you much about what that something specifically was.  The only remaining way for a computer to give intuitive feedback is the sounds that a program makes.  Think about the Windows Critical Stop sound or the early mac equivalent uh-oh sound; if you have a strong emotional reaction to something your computer does its probably to these sorts of sounds.  So it's not surprising that smart interface designers will spend a significant amount of time honing the sound effects in their programs.

Something I thought was especially interesting, when listening to the the podcast, was that the sort of sounds that users gravitated to most were the ones that were the recorded sounds of something mechanical happening, such as a vise grip being released, rather than something synthesized.  Even when using a device that would only make a big mechanical sound if it was being destroyed, people still enjoyed the unsynthesized sounds of real things happening.

Of course, things do have a way of coming full circle.

Wednesday, October 10, 2012

The Need for Speed - Part 4 of 7: Performance tuning parameters: Nastran

Previously I've covered how to find the optimal memory and cpu settings for Abaqus and Optistruct.  While the procedure for those FEA solvers may have seemed complicated, MSC Nastran (and for the most part its spinoff Nx Nastran) take this complexity to a whole new level.  The fundamental ideas are the same:
  • Get off the hard drive
  • DMP until you run out of RAM
  • SMP until you run out of processors
While DMP and SMP are no more difficult than Abaqus or Optistruct, getting off the hard drive is much harder because, in my experience, Nastran does not do a good job of telling you what it needs.  Whereas Abaqus and Optistruct will tell you exactly what they need, so the memory settings are easy to set, Nastran will only provide partial guidance, particularly when memory settings exceed 32-bit limits.  Further, Nastran does not dynamically allocate memory, so unlike Abaqus if one overcommits memory Nastrann will still make it unavailable to the computer even if Nastran is not using it.  Not only that, Nastran will typically output memory values in terms of 'words.' A word is a chunk of memory large enough to contain a single number and can be either 4 bytes or 8 bytes large.  This makes reading diagnostic files more tedious as one has to multiply a word value by the bytes/word to find the bytes value. 

Limits of Nastran estimate Procedures


If you follow the official recommendations, there are in fact two ways to estimate the memory needed to not use the hard drive.  The first method is to use the 'estimate' program.  It will generate output as seen below:
Reading input file "./.bdf"
...
 Submit with command line arguments:
   memory=532.9mb


 Estimated Resource Requirements on a 32-bit system:
  Memory:                   532.9 MB
  Disk:                    9125.4 MB
  DBALL:                   6099.1 MB
  SCRATCH:                 3049.6 MB
  SCR300:                   845.3 MB
  SMEM:                      25.0 MB


Well this looks very helpful, except it's very very wrong.  This is the output for a model I know used memory=33GB of memory to run at full speed.  Estimate didn't even try to recommend an aggressive smemory setting to reduce the amount of data written to the Scratch and Scr300 files.

The other recommended way to find the optimal memory settings is to run the model until the .f04 file displays UIM (User Information Message) 4157.

 *** USER INFORMATION MESSAGE 4157 (DFMSYM)
     PARAMETERS FOR PARALLEL SPARSE DECOMPOSITION OF DATA BLOCK SCRATCH ( TYPE=CSP ) FOLLOW
                      MATRIX SIZE =   1163947 ROWS             NUMBER OF NONZEROES =  29408809 TERMS
           NUMBER OF ZERO COLUMNS =         0        NUMBER OF ZERO DIAGONAL TERMS =         0
                     SYSTEM (107) =     32770                      REQUESTED PROC. =         2 CPUS
           ELIMINATION TREE DEPTH =      8372
                CPU TIME ESTIMATE =       915 SEC                I/O TIME ESTIMATE =         3 SEC
       MINIMUM MEMORY REQUIREMENT =     36169 K WORDS             MEMORY AVAILABLE =   2807326 K WORDS
     MEMORY REQR'D TO AVOID SPILL =     36432 K WORDS         MEMORY USED BY BEND  =     36170 K WORDS
     EST. INTEGER WORDS IN FACTOR =    143016 K WORDS           EST. NONZERO TERMS =    299087 K TERMS
     ESTIMATED MAXIMUM FRONT SIZE =      2429 TERMS                 RANK OF UPDATE =        64


If we examine the .log file for this .f04, we see that mode=i8 was chosen, meaning each word is 8 bytes.  Multiplying the Memory Reqr'd to avoid spill, aka the estimate for in-core memory, we find that UIM 4157 estimates that 36.432 megawords x 8 bytes/word=291 Megabytes of memory will be required of the 2.807 Gigawords x 8 bytes/word=22.46 Gigabytes made available to hicore by the difference between the mem and smem memory settings.  So what did this model actually use?  If we read the Total Memory and Disk Usage Statistics section in the .f04 file, we see the following useful output data:

*** TOTAL MEMORY AND DISK USAGE STATISTICS ***

 +---------- SPARSE SOLUTION MODULES -----------+
    HIWATER               SUB_DMAP        DMAP  
    (WORDS)   DAY_TIME      NAME         MODULE 
 2185800160   16:44:56    FREQRS    655  FRRD1  

...


+------------------------------ DBSET FILES -----------------------------+
 FILE      ALLOCATED  ALLOCATED    HIWATER       HIWATER  I/O TRANSFERRED
            (BLOCKS)       (GB)   (BLOCKS)          (GB)             (GB)

 MASTER         5000       2.44        155         0.076            0.093
 DBALL       2000000     976.56          3         0.001            0.004
 OBJSCR         5000       0.31        387         0.024            0.032
(MEMFILE       32768      16.00      31291        15.279            0.000)
 SCRATCH     2000000     976.56          1         0.000            0.000
 SCR300      2000000     976.56          1         0.000            0.000
                                                           ==============
                                                    TOTAL:          0.130

This model actually reached a hiwater of 17.48 GB, which means that UIM 4157 was off by a factor of roughly 60.  Therefore I usually find little value in UIM 4157.

A practical procedure for finding optimal memory settings in Nastran

I therefore propose a two step process for finding what Nastran actually needs
  • Give Nastran as much memory as your system has
  • Find out what Nastran actually used

How much memory you can set on the memory command will be dependent on machine and system.  If you have a 32-bit machine or operating system, such as standard Windows XP, then the maximum will be 2 GB.  If you are on a server you can set the mode=i8 flag and can use as much memory as is installed in the machine.  The following will assume that a user knows how much memory is physically available in a system, the installed memory minus the operating system overhead and that used by other users.

The memory settings in Nastran are:
  • memory=XGB, where X is the memory in Gigabytes.  The total memory allocated
  • smemory=XGB, where X is the smemory in Gigabytes.  Smemory is short for Scratch Memory.  Remember that the memory available to hicore, or the main solver, is mem-smem.  Therefore smem must always be smaller than mem, and increasing smem without increasing mem will reduce the amount of data available to the main solver
  • mode=i4 (default) or i8.  This sets 4 bytes/word or 8 bytes/word.  Set it to 8 bytes word on any 64 bit machine with more than 16GB of memory so that Nastran can take full advantage of the memory in the machine.  Note that if a model requires memory<16GB when mode was set to i8, you can take that memory required, divide it by half, and set mode=i4.
So to begin, set memory to as high as it can be set on the machine, depending on how much can be set on the machine.  Then set smemory to about half of memory, as a large amount of scratch data is typically needed by a Nastran analysis.  Try the following settings, depending on machine.  The smallest machine I'm considering has at least 2GB of physical memory available.
  •  32-bit machines :  memory=2GB smemory=1GB
  • 64-bit machine with 8GB of memory: memory=6GB smemory=3GB mode=i4   (Try not to use more than ~80% of system memory)
  • 64-bit machine with 8GB to 16GB of memory:  memory=8GB smemory=4GB mode=i4
  • 64-bit machine with >16GB of memory: memory=(.8xtotal installed memory) smemory=(memory/2) mode=i8
Run your job until it completes.  Check the total memory and disk usage statistics.  The line MEMFILE will tell you how much smemory you actually need.  If the Hiwater on Scratch and SCR300 is zero then congratulations, your machine had enough memory to run this job without the hard drive slowing it down.

 +------------------------------ DBSET FILES -----------------------------+
 FILE      ALLOCATED  ALLOCATED    HIWATER       HIWATER  I/O TRANSFERRED
            (BLOCKS)       (GB)   (BLOCKS)          (GB)             (GB)

 MASTER         5000       2.44        155         0.076            0.093
 DBALL       2000000     976.56          3         0.001            0.004
 OBJSCR         5000       0.31        387         0.024            0.032
(MEMFILE       32768      16.00      31291        15.279            0.000)
 SCRATCH     2000000     976.56          1         0.000            0.000
 SCR300      2000000     976.56          1         0.000            0.000
                                                           ==============
                                                    TOTAL:          0.130

For instance, this job required 15.28 of the 16GB allocated to Smemory.

Then check the hiwater.  This will tell you how much hicore (memory-smemory) you need.
*** TOTAL MEMORY AND DISK USAGE STATISTICS ***

 +---------- SPARSE SOLUTION MODULES -----------+
    HIWATER               SUB_DMAP        DMAP  
    (WORDS)   DAY_TIME      NAME         MODULE 
 2185800160   16:44:56    FREQRS    655  FRRD1


This particular job was run with mode=i8, and needs 2,185,800,160 words x 8 bytes/word=17.48GB of hicore.  Therefore this job, rerun, would use memory=17.48GB + 15.28GB=32.76GB and smemory=15.28GB, with mode=i8.  Note that if hicore>hiwater, that means that not enough memory-smemory was set, and that this amount should be increased if it can.

Setting SMP and DMP
Nastran has a long history of SMP type parallelization.  Use either the 'SMP' or 'parallel' keywords to set parallel execution, which will work for nearly any modern  installation and will use no more RAM.

DMP can also be used in solutions 101,103,108,110,111, and 112.  Check the domainsolver executive control statement for more details on how to divide up the problem.  When executing, use the 'DMP' keyword to set the number of domains.  If there are other machines on the network that have the same version as Nastran and can accept the rsh command, then they can also be used to run Nastran in DMP.  Use the hosts=node1:node2:node3:node4 keyword.  More details on how to configure this can be found in the MSC Nastran Installation and Operations guide, 'Runing Distributed Memory Parallel (DMP) Jobs'.  Similar to Optistruct, each DMP requires its own memory allocation, so memory required is Memory x DMP.

GPU is a very new development for Nastran, and unfortunately is only supported in SOL 101.

Friday, October 5, 2012

Friday Video Mash-Up: BeamNG

As I mentioned earlier, BeamNG does an amazing job of simulating physics in video games.  Here is a video of it in action.



BeamNG offers some caveats in their description of their video: "If things seem too bouncy, please remember we're using real life spring and damper rates, and it probably looks wrong because nobody ever does anything this extreme with a real car".  For comparison, here is a real crash test:



So what do we notice when we compare the two?  I see two things:
  • In the real crash, the metal quickly deforms into a complex shape, and doesn't bounce around much after that, similar to the process of smashing an aluminum can.
  • In the real crash, there is some bouncing around of the thin metal on the outside of the vehicle.  The bouncing of these 'body panels' is visible immediately after impact but quickly stops.  By comparison, BeamNG does indeed seem to bouncy.
Let's look at a higher fidelity 'real' computer model and see if it can provide us any insight into what's driving the limitations of BeamNGs' physics.  Below is the Abaqus benchmark model e1, representative of a passenger car impacting a wall at 25 mph.  It has tens of thousands of points at which the vehicle crash is simulated. The simulation simulates half a second of impact, and the below video of it displays at 1/5 speed.


Same model, but with a cross section cut thruough it



Looking at this model, it appears that the rapid permanent set of the material happens in a much more realistic fashion.  This can easily be explained by the much higher levels of detail allowed by a model which does not have to run in real time.  But the body panels are still bouncing around!  Why is this?  For insight, I first looked at the Abaqus input deck to see what was being used to try to stop the body panels from bouncing around.  Specifically, what I'm look for is 'damping', the name for what keeps a bell from ringing forever.  Unsurprisingly, I found nothing: this model has no damping in it outside of the suspension.  Why would this model not include realistic damping to create a more realistic simulation?

The answer will be very familiar to FEA analysts who do significant amounts of work with explicit FEA solvers.  Damping of the kind typically found in structures is of a form that is proportional to the deformation of the metal, and as can be seen in the Abaqus Analysis Users's Manual ver 6.10, section 23.1.1, small amounts of this 'stiffness proportional' damping can reduce the stable time increment, i.e. how much time can pass in one simulation step without the FEA model exploding.  This reduces the speed of the simulation, and for a relatively minor amount of structural damping can reduce the speed of the simulation by a factor of ten or more.  So for an FEA analyst simulating a car crash or cell phone impact, not only is damping not going to have a very large effect over the very short time scales of an impact, it will significantly increase the time it takes to arrive at a solution.  This is why damping was left out of this model and why it behaves strangely.  Damping can be so hard to add to explicit FEA solvers that a typical approach is to just filter the results after the fact.

Which brings us back to BeamNG. In a recent interview, the developers spoke at length of the challenges faced by video game developers trying to do structural simulation in real time, stating that "Mass-spring systems have very bad stability, they tend to explode, and are very CPU intensive."  My guess is that BeamNG may have experimented with more realistic damping rates for more of the vehicle, but after noticing the massive changes in stability in their simulation removed most of it, leaving damping only in the suspension.

Of course, all this is nitpicking, and I wish them the best of luck in their ambitious endeavor.  If I may offer some unsolicited advice, I'd suggest that they be the ones to make the first game that uses their technology.  All the successful video game engines and rendering software that I can think of were noticed after the developers made an original product that captured the interest of the public.  Consider Pixar's RenderMan, the Doom engine, the Unreal engine, the Crytek Cryengine, and many others.  These may never have been used as widely had the developers of these engines not used their intimate knowledge of what their  technology can do to make something great.  Fortunately, it looks like BeamNG is already heading down this path, and I look forward to a new way to use my computer to simulate smashing things.

Wednesday, October 3, 2012

The need for speed: Part 3 of 7 - Performance Tuning Parameters: Abaqus and Optistruct Linear

Previously we've discussed the three main priorities when trying to use the optimal settings for an FEA solver.
  • Get off the hard drive
  • DMP until you run out of RAM
  • SMP until you run out of processors
We also discussed an analogy that we will continue to make use of.  Essentially, solving an FEA solution is analogous to sorting a deck of cards on a table.  In this analogy, SMP is equivalent to using the same amount of table space (table space being equivalent to memory), but using more people to help sort, with people being equivalent processors.  DMP, on the other hand, involves breaking up the table space into seperate parts, either on one table or on a separate table in a different room.

So given what we now know, how do we accomplish the optimal use of computer resources using specific commercial FEA software?

A Note on system limitations
The following blog post assumes a user is already knowledgeable about the limits of the system they are running their FEA simulations on.  Techniques for checking available system resources vary considerably between windows and Unix/Linux based environments, and identical hardware may have different limitations based on 32-bit vs. 64-bit operating system limits, among other things.  I plan to cover this topic in more detail in a later post.

ABAQUS
Of the three codes I typically cover, Abaqus is the easiest to properly configure for speed.  All Abaqus needs to know is how much memory is free on a computer, how many processors are free on a computer, and what machines it can use.  There is almost nothing else to set.

To begin, start by doing a datacheck.  Assuming you know how much memory and how many cpus are in your computer or server, and that you know how to execute using the command line, start with the command:
abq6101.bat memory="1 gb" datacheck job=jobfile.inp
 note that abq6101.bat is what I use on my local machine.  On linux servers it will typically be /opt/abaqus/Commands/abq61... depending on version.

Hopefully your job has been carefully prepared and the Abaqus pre-processor will finish without any errors.  On linux machines, or windows machines running cygwin, you can monitor the progress of the datacheck run with the tail command.  Ex:
tail -n 100 -f jobfile.dat
The important data will be held in a section that reads:

 PROCESS      FLOATING PT       MINIMUM MEMORY        MEMORY TO
              OPERATIONS           REQUIRED          MINIMIZE I/O
             PER ITERATION         (MBYTES)           (MBYTES)
 
     1         1.86E+012              691               5623


For this run, at least 691 mb must be available to the abaqus solver, and no more than 5623 mb needs to be used by it.

As an aside, Abaqus handles DMP in a very different way than Nastran or Optistruct.  Whereas Nastran and Optistruct multiply their memory usage by a factor of memory requestedxDMP, Abaqus will report the total memory needed by all the DMP processes.  This is because Abaqus has put significant development effort on a style of DMP that's not quite full DMP, one that is not akin to a full box of cards being sorted independently on different tables or in different rooms, but is more like several people at one table, but with significant effort being put into sorting out who gets what chunk of cards ahead of time.  In this way, with some effort by the first person to grab the deck of cards, many people can be kept busy on a smaller table, and a single deck of cards can even still be worked on a great deal of tables.  When there are several decks of cards to sort this becomes less efficient than the Nastran/Optistruct style, but Abaqus' approach is the best for attacking one deck.

Returning to the memory settings, because the memory total is the true memory total, one need only set it to whatever is the maximum actually available on a computer and Abaqus will handle the rest.  Abaqus is even smart enough to only use as much memory as it needs, as long as that is less than what you allocate it.

So what happens if we have a desktop computer with 8GB of ram and a 64-bit operating system with virtual memory, and we try running it with memory ranging from the bare minimum to the maximum?

So what happened?  Wasn't adding memory the most important thing?  Well here's the big surprise: a modern operating system helps a great deal when you do one of two suboptimal things:
  • Using less memory than you actually have
  • Using more memory than you actually have
If you ask for less than you have, the operating system will be smart enough to use the virtual memory of your computer, which is where memory can substitute as ram and vice-versa, to keep your FEA program from using the hard drive as much as it can.  If you try to use more memory than your computer has, it will still use the virtual memory, only this time in the other direction.  As I've mentioned before, you're still better off using the FEA softwares' own out-of-core scratch memory management, but even if you do something wrong your operating system will usually be there to minimize the damage.

So let's say you now have enough memory, how do I use more cpus?  Again, the syntax is simple
abq6101.bat memory="6 gb" cpus=2 job=jobfile.inp
In this case we are requesting 6 GB of memory and 2 cpus.  What are the performance benefits from additional cpus?  Typically, the first cpu nearly doubles performance, 4 cpus is a little more than triple speed, and so on as the returns diminish.  More detail can be found here.

How about Gpus?  Although I don't have a system that can take advantage of it, in the latest abaqus 6.12 release it is selected with the syntax
abq6121 memory="6 gb" cpus=2 gpus=1 job=jobfile.inp
The above will use two cpus, 6GB of memory, and 1 GPU (1 video card).  Note that there are significant limits on the solvers that gpus can be used on, specifically only the sparse symmetrix solver in implicit solutions.  This rules out explicit and extremely nonlinear problems, such as those with coefficients of friction higher than 0.2

Finally, the use of more than one host is enabled in the abaqus_v6.env file.  HP-MPI or some other interconnect must be enabled, and infiniband will yield measureably higher performance than 1GBit ethernet.  When using extra hosts, check to make sure that rsh commands are enabled on ssh through the use of key based logins. To enable additional hosts, include in the abaqus_v6.env file the following line:
mp_host_list=[['host1',4],['host2',4]]
The first entry is the name of the host on the network, the second the number of cpus that can be used on that host.  Any run that uses more than one cpu and has extra hosts available in the abaqus_v6.env file will take advantage of them. 

One final thing to remember is that many of the previous memory and cpu settings where originally only set in the abaqus_v6.env file.  It may be worth it to check what parameters remain in the file, either from yourself or your adminstrator, as many of them have changed over the last few versions.

Optistruct
Optistruct can be optimized in a similar, by first running a datacheck and then using the correct memory parameters.
If one is in a hurry, one can simply run optistruct with the below settings
radioss -core in -cpu 2 jobfile.fem
If sufficient memory is available to run in-core with minimum disk use, it will use it, and in this example it will use two cpus in SMP style.  Note that if insufficient memory is available for in-core it will error out.

If we want to see what a job will need, then we can instead run
radioss -check jobfile.fem
Assuming your model is built well, the .out file will eventually read like this:

MEMORY ESTIMATION INFORMATION :
-------------------------------
 Solver Type is:  Sparse-Matrix Solver
                  Direct Method

 Current Memory (RAM)                                    :     116 MB
 Estimated Minimum Memory (RAM) for Minimum Core Solution:     154 MB
 Recommended Memory (RAM) for Minimum Core Solution      :     154 MB
 Estimated Minimum Memory (RAM) for Out of Core Solution :     189 MB
 Recommended Memory (RAM) for Out of Core Solution       :     211 MB
 Recommended Memory (RAM) for In-Core Solution           :    1515 MB
 Recommended Number of Nodes for OS SPMD Parallel Run    :       1
 (Note: Minimum Core Solution Process is Activated.)
 (Note: The Minimum Memory Requirement is limited by Assembly Module.)      
 (Note: Use param,HASHASSM,yes to avoid assembly module memory bottleneck)

DISK SPACE ESTIMATION INFORMATION :
-----------------------------------
 Estimated Disk Space for Output Data Files              :      18 MB
 Estimated Scratch Disk Space for In-Core Solution       :     208 MB
 Estimated Scratch Disk Space for Out of Core Solution   :    1895 MB
 Estimated Scratch Disk Space for Minimum Core Solution  :    1952 MB


In the above example, we would need a total of 1515+208MB of ram for a maximum reduction in hard drive use.  The syntax for this would be
radioss -len 1723 -ramdisk 208 -cpu 2 jobfile.fem
As noted before, additional -cpu cpu's add some measure of speed but do not require more ram.  

If a job is of a type that can also DMP, then they can be requested using the syntax:
radioss -len 1723 -ramdisk 208 -cpu 2 -mpi -np 4 jobfile.fem
A few things I've found: hpmpi sometimes has issues, so take care with the documentation to show it the path to the hpmpi files if necessary.  Also note that the total memory demands are now 4x1723=6892 MB, and the total number of cpus needed is 4x2=8.  Solution sequences that make use of DMP well tend to be direct frequency and lanczos eigensolutions, although eigensolutions will nearly always be faster when using a modern multilevel eigensolver.

Closing Remarks
As we've seen, different solvers can be very different in how they understand the tuning parameters given to them by users.  Additionally, there is typically a significant amount to be learned before even solving a model by simply invoking a checkout run.  As we will see in the upcoming Nastran entry, checkout runs are unfortunately not as capable in MSC/MD Nastran.

Thursday, September 27, 2012

The need for speed: Part 2 of 7 - Performance tuning parameters: Common Topics

What are the most important things to keep in mind when changing the settings for an FEA solver?
  • Get off the hard drive
  • DMP until you run out of RAM
  • Parallel whatever cores you have left

An FEA solver must first create a system of linear equations which describe the FEA problem to be solved.  This is sometimes called assembling the stiffness matrix, although other matrices may be assembled, such as mass or damping.  These equations are stored in matrix form, and must then be operated on by a mathematical solver in order to arrive at the solution to the FEA problem.  In approximate order of increasing memory requirements, the typical solvers seen in structural FEA are:
  • Forward Integration time step: explicit transient dynamics, typically short duration transients, such as vehicle impact
  • Iterative: statics, linear and nonlinear, typically dense and blocky models, such as an engine block with mild contact nonlinearity
  • Gaussian Direct Elimination, non complex: statics, linear and nonlinear, steady state or implicit transient.  One of the most common solver types
  • Gaussian Direct Elimination, complex : frequency response, the steady state response of a structure when loaded by loads which all have the same forcing frequency, such as a speaker generating a single tone or vehicle NVH analyses across a frequency range
  • Eigenvalue solvers, modern methods (automated multilevel substructuring): natural frequency and buckling loads, also used for modal type solutions to direct frequency and transient responses
  • Eigenvalue solvers, older methods (Lanczos, etc.)
Already we can see something interesting.  Whereas one might think of how 'big' their model is in terms of the number of nodes or elements, the computer sees the model as big as the solution determines it will have to be.  A model that runs quickly with particular memory settings on a particular computer when solving a linear statics problem may run much, much slower when one wants to find an associated buckling load.

So what sort of memory should we use to store the system of equations?   Given the layout of a typical computer's memory hierarchy, the most practical, high speed place to store the system of equations so that they can be quickly solved is the random access memory or memory of the computer.

At this point those with a background in computers may be wondering what if the system of equations is too large to fit in the free memory of the computer?  A typical computer program would have the operating system store whatever didn't fit into virtual memory, or hard drive space that behaves as memory.  For most programs, this is the best thing to do if there is insufficient memory.  FEA solvers, however, are different.  FEA solvers predate operating systems with virtual memory, and have developed their own specialized methods of making due with less than the ideal amount of memory.  What an FEA solver does instead is if the memory allocated to it is less than what the system of equations needs, then it will revert from what is known as an in-core solver to an out-of-core solver which will load a portion of the system of equations and solve it before putting it away in a scratch file and loading a new chunk to work on.

So now we have two interesting things to remember:
  1. The same level of detail may take different amounts of memory to solve depending on solution type
  2. The solver will need to know how much memory is actually available so it can intelligently decide how to store the system of equations, either in-core or out-of-core
In later posts, I will cover in more detail how to find how much memory is available in a system, find out how much memory a program estimates it will need to solve a problem, how much memory is required to quickly solve the problem, and what settings to use given the available memory and estimated memory requirements.

So now we've given a general answer to the first question, how to get off the hard drive: allocate enough memory.  Now what?

Now we DMP (distributed memory process).

Solving a large system of equations is analogous to sitting at a table and ordering a shuffled deck of cards.  If we have only one deck of cards to sort, and only one desk, and only one person sorting them, then there are no decisions to be made about how to allocate resources.  Now let's say you have a second person that can help you, and a completely separate deck of cards to sort.  If you have enough space on the desk, then they can spread out the second deck and begin sorting without slowing you down.  This is equivalent to a DMP run on a single computer: enough memory (desk space) and enough people (CPUs) and a problem that can be divided up (seperate decks of cards).  This is confusing for many at first, including myself, because a computer with more than one CPU that shares the same pool of memory is technically an SMP computer system.  So from a computer science perspective, a desk with a piece of tape down the middle is an SMP system doing DMP style work, whereas two separate desks in two separate rooms is like true DMP.

So what sort of FEA solutions lend themselves well to DMP style splitting of workload?  Many, in fact, although speedup can be very solver and solution type dependent.  DMP can be used by lanczos eigensolvers, and very easily used by direct frequency analyses, where there is a frequency that needs to be found at each frequency and each frequencies' solution can be found independently.  It can also be used by  ABAQUS for gaussian elimination, iterative, and explicit solvers, although the overhead of 'passing cards' from room to room can begin to dominate the solution times.

So what if you have only one computer, and you don't have enough memory to split the solution up any more?  This is where SMP can be of use, and is in fact one of the oldest ways that more than one CPU has been used to accelerate FEA solution times, dating to Nastran running on CRAY supercomputers.  SMP is equivalent to having only enough room on the desk to work on the one deck of cards, but more than one person reaching in and working on ordering the cards.  As you can imagine, returns quickly diminish for this type of parallelization, but if the CPUs are available there's no reason not to take advantage of it.  Nearly any solver and nearly any solution type can take advantage of this style of parallelization, with the notable exception of Abaqus, which has instead focused on parallizing in a manner that divvies up the cards intelligently ahead of time.

There is one final, more exotic, method of SMP that is sometimes available.  With the growing general abilities of the vector processors inside of the graphics cards in computers and video game systems, programmers have begun to use them as a new type of person working to sort the deck of cards.  Only in this instance, its more like a trained bag of mice, mice which still need a person to bring the deck of cards to the table and spread them out, but can handle much of the hard work from there.  This type of acceleration, oftentimes referred to as GPU acceleration, will only become more prevalent as the limits in using more and more transistors to make a single cpu faster become apparent.  As of now though, support is very limited, and only the most basic static solution of Nastran and the Gaussian solver, non complex, symmetric solver of Abaqus is supported, and only on post Windows XP operating systems.  The latest Intel chips,
Sandy Bridge and later, have a functionality similar to this that is much better supported in Nastran than GPU acceleration.

We've now covered, in very general terms, how to get off the hard drive and make the most of our ram, and having done that make the most of our CPUs and even our graphics cards.  Next we will cover solver specific tuning parameters that take advantage of these concepts.


Tuesday, September 25, 2012

The need for speed: Part 1 of 7 - Intro

How can I make my FEA simulation run faster?
Ask less of your computer, use more of your computer, or use a faster computer(s).

Once you've meshed the geometry, applied the material properties, applied the loads, assemble the load cases, and the model runs sometime today, doing more work to make the model run faster is probably the last thing on your mind. Tuning parameters and measures of the solution difficulty of a model can seem dauntingly complex, and the payoff from a model that runs faster may not seem worth it. However, a working knowledge of how to manage the detail and run times of a model can have some major payoffs:
  • knowing ahead of time if a model has surpassed the ability of a computer to quickly solve
  • minimizing storage and archiving requirements
  • ability to run the same model several times to assess design sensitivities and optimize the performance of a design
This final advantage is oftentimes the most important reason to put effort into minimizing run times. The ability to analyze a design more than once in the time available can make the difference between a high performing design and and an inadequate design.

In a series of posts, I'll outline the practical steps an analyst can take to speed up their run times. This is obviously an extremely complex topic to fully cover.  Fortunately, a little knowledge can go a long way, and I will focus on the topics that typically have the largest impact.  Upcoming posts will cover:

Use more of your computer:
  • Performance tuning parameters: Common Topics
  • Performance tuning parameters: Abaqus and Optistruct/Radioss
  • Performance tuning parameters: Nastran
  • A faster computer: performance metrics
Asking less of your computer
  • Reducing output requests
  • Optimizing model detail and solution type



Thursday, September 20, 2012

MPCs: a quick summary

Many years ago one of my coworkers built a model in Optistruct and was disappointed to see the RBE that he'd built failed to provide the restraint he expected.  He called up the technical support at Altair, and after some back and forth about the RBE element the support representative finally lost his patience: "It is not an element, it is a system of equations!"

So what are these equations, and what do they do?
First of all, they are indeed different from an element.  Although FEA software may refer to them as a type of element, they have a fundamental difference from a normal element. A normal element adds to the global stiffness, mass, and damping matrices. An MPC (multi point constraint) is more like an SPC (single point constraint), in that it modifies instead of adds to the matrices that describe the model in order to describe the special behavior of these MPCs.

So all that sounds interesting, but it hasn't really told you much about what they do. I've thought of different ways to describe them to my coworkers, and the best I've come up with is this analogy. Basically, you're including a group of grids in a type of government, and you're letting them know what kind of leader they have. So what's an RBE2? An RBE2 is a monarchy, where the control grid is king, and tells the other grids what to do. An RBE3, on the other hand, is more like a democracy, where the President does what the people tell it to do. So what works and what doesn't work when we try to make use of these MPCs?

What Works

The above works because the king can tell the dependent nodes what to do.  There are constraints on grids, but not on the same ones as the RBE2.

The above also works because the dependent grid knows what to do because the nodes that vote for it at least know what they want; if the elements beneath them were unconstrained then the whole set of elements would not know what to do.

This one works because the voters still know what they want: the constraints tell them what they want! The RBE2 above the RBE3 also knows what to do because the president is telling it what to do.  Ok, My analogy is getting old.  I'll switch to talking about independent/dependent grids.

As you can see above, I lied, the deputies are out!  Through the use of UM grids the RBE3 is not the dependent element, and can now be told what to do.

The above also works, as the UM grids allow the top RBE3 to not have its dependent grid be the same as the bottom RBE3 dependent grid.

Finally, the above works because the top RBE2 is only connected to independent grids.
You can download the above examples at RBEThingsYouCanDo.bdf

What Doesn't Work
Well this won't work, the RBE2 is trying to tell the dependent grids what to do, and the constraints are doing the same thing.

The above also doesn't work.  There are nodes which are dependent nodes in two separate RBE2's.  

The above also has problems.  The UM grids have been set, making them no longer independent, but those grids also have constraints, so again we have grids where MPCs and constraints are both trying to tell  a grid what to do.

Finally, the above is also bad.  You can't make a grid the dependent grid of more than one RBE3.
You can download the examples of things that don't work at RBEThingsYouCantDo.BDF

There are fun tricks you can do once you're feeling comfortable with RBE3s.  The two most interesting are:
-Applying load without adding stiffness.  If you have a certain total load that needs to be applied across a cross section of a model, you can use an RBE3 to apply it.  This is particularly useful at the ends of a beam.
-Constraining the overall motion of a larger body.  If it's known that something is sitting on a very soft support, such as rubber or water, an RBE3 can be created across the supported region, and through the setting of the UM card you can constrain the former dependent grid to control the overall average motion of the model.

There is, of course, something that analysts commonly do when they need to make a part of their model very stiff but don't have the time or experience to construct MPCs without errors: create elements with very very stiff properties that will behave similarly to MPCs.  The upside to this approach is that it will typically yield a very similar answer, the downside is that it may take a few iterations to find a stiffness that is stiff enough, yet not so stiff the stiffness matrix becomes ill-formed.

So wouldn't it be great if there was an MPC that automatically made a very stiff element that was just stiff enough?  Something that automatically created a perfectly rigid body but then had stiffness that connected it to the dependent grids?  Well there is in fact an element that does this.  This type of element is typically called a Lagrangian element, and it can be selected in certain load cases in Nastran and is the behavior for Kinematic (RBE2 style) and distributing (RBE3 style) elements in Abaqus.  I'll cover this topic in more detail in a later post.

Update 28 Sep 2021: Due to a Google security update the example file links expired, should be working again.

Monday, September 17, 2012

Words that are used interchangeably

FEA has a number of words and phrases that mean exactly the same thing.

Grid/Node: A point in space that defines the shape of elements.  It is also where the solution is either found or set by an SPC/Constraint/Boundary Condition.

SPC/Constraint/Boundary Condition: A point at which the solution is set to a certain value.  Examples include setting the displacement to zero at the end of a beam or setting the temperature to a certain value.  Some solvers allow some panache when setting the solution to a certain value, such as in Abaqus, where one can use the uncommon architectural term 'encastre' to describe something that is fixed in place.

Degree of Freedom, DOF, detail level: how many numbers it takes to describe a complete solution.  This is also a measure of how long it will take the model to solve.

Job/Model/Run/Input Deck:  A data file, typically ASCII plain text, which fully describes the FEA problem which the solver will find a solution to.  Sometimes a run can refer to the specific time a model was solved, in which case an analyst might describe the runs they did with different settings to find the fastest solution times based on things like memory and cpu settings.

Node/System: A single computer, one that may be a part of a networked computer cluster.  This computer will typically have more than one CPU and a single large bank of memory that can be accessed by any of the CPUs

Stiffness Matrix/System of Equations: the series of equations which describe the behavior of the FEA system, stored in matrix form, typically in a sparse symmetric storage format, most commonly one that assumes symmetric matrices.  A solver will work best if these matrices can fit into memroy without using the hard disks of the computer.

CPU/core: A single independent central processing unit.  Due to advances in shrinking chip sizes, a typically computer will typically have more than one core, packaged together, which appears to the computer no different than several independent CPUs.  High performance server computers may have more than one CPU slot, each with more than one core, for total CPU/core counts of 2x2 or more.   CPU counts of 16 are not uncommon.

RAM/Memory/Random Access Memory: Memory in a computer which is fast but loses its state when the system is turned off.  Typically measured in single digit to hundreds of gigabytes(GB).

Hard Disk/Hard Drive/SSD: Slow but permanent memory which does not lose its state when a computer is turned off.  Although SSD's have narrowed the distance between traditional magnetic memory hard drives and RAM, SSDs and hard drives remain about 1,000 times slower.

SMP/Parallel: A style of solving a system of equations that applies more CPUs but little extra memory to the same problem.  An analogy would be sorting a shuffled deck of cards on a table.  An SMP/parallel approach would be to spread the cards out over one table but have more than one person working to sort the cards

DMP: A style of solving a system of equations that applies more CPUS and more memory, either all on one node/system or several systems.  If done on a single system, it is analogous to sorting a shuffled deck of cards on a large table, where each person sorting the cards is working on their own assigned portion of the table.  If done on more than one system, it is analogous to more than one table in more than one room, which can pass cards back and forth between rooms, but always slower than back and forth across a table.

GPU/GPGPU/OPENCL: Using the simple but numerous programmable cpus of a graphics card to do general purpose vectorized computing.



Friday, September 14, 2012

Friday Night Music: Madeon and his secondary input device

I can do quality work in a number of FEA GUIs, but I am very fast in Hypermesh.  Sometimes when I'm in the zone, headphones on and focused on the model, favorite secondary input device in hand, I can feel pretty slick.  Then I see things like this and feel humbled.

Thursday, September 13, 2012

What is FEA?

A hypothetical conversation with a family member unfamiliar with FEA

Your blog is... interesting?  I have no idea what you're talking about!  Why'd you make it?

A few reasons:
-I have former co-workers who still use many of the same tools that I still use, and I wanted a place where I could document some of the FEA knowledge that I've picked up in the years since we worked together
-I also wanted a place to have more informal discussions of competing commercial software packages, similar to how many consumer products are reviewed
-Finally, I wanted to play around with some fun stuff

What is FEA?  Am I saying that right, feeya?

FEA (pronounced Eff-ee-A) is a numeric technique used to simulate the structural, thermal, acoustic, and sometimes fluid behavior of a volume of material by dividing it into a series of individual elements defined by the grids that they connect to, which are the locations where the primary field variables (such as displacement or temperature) are calculated.  These field variables can then be used to estimate other variables within the elements, such as stress or heat flux.

Explain all that again, but like I'm five. Not that I don't know what you're talking about!  Ha-ha..

FEA is basically a type of computer program that lets an engineer find out if something is strong enough, stiff enough, or quiet enough.  That something could be anything, like an airplane, car, smart phone, or bell.

What's a computer?

A calculating machine made up of input and output devices which contains both working memory and a processor which operates on it.

I kid, I know what a computer is.

Ok, uhh good.

So who uses this stuff?

Lots of people!  There are dozens and dozens of FEA programs one could buy or use, and nearly any high volume consumer product  has been analyzed in FEA.  In just the last few years its use has exploded from mostly aerospace (Airplanes etc.) to almost everything else, from cars to cell phones to medical implants to diapers.

Angry Birds too right? And like movies?

Not too many games or movies have done structural physics very well.  The big exceptions have been Bridge Builder and World of Goo, both good simulations of a bridge model in FEA.  Angry birds has a collision detection engine, which tracks when things touch each other, but the behavior of a brick or piece of wood once it's been hit is actually very basic; it has a certain amount of bounce reaction and it's either damaged to failure or not.  FEA is more associated with a single piece bending and changing shape; BeamNG is the closest I've seen.

Movies are starting to become a little more sophisticated.  Pixar has been very open about their efforts to add more physics to their animation software, particularly for hair, such as the work they put into making the heroine's hair in Brave more realistic.

That was more than I really cared about there.

Yeah I know.  Fortunately I'm just imagining this conversation!

AWESOME!

Wednesday, September 12, 2012

Ringing in the Blog

Deformed structural models can usually be animated, depending on post-processor.  While this can be very useful, I thought it might be interesting to hear how a model sounds. I found a dimensional reference recently that allowed me to craft a very basic Abaqus Explicit model which should sound like the real thing.

Using symmetry, I was able to build the model with less than 1000 nodes.


The sound propagation is still very approximate, as sound only emanates from  a single location on the surface of the bell.  I'll follow up on this model by adding an acoustic mesh to allow pressure waves to propagate towards receiver location properly.  Still, it sounds kind of neat., and not too far off from the real thing for a first cut.