Skip to content

How to approach hardware selection in 2025 for Ansys CFD solvers (all budgets considered – from laptops to clusters!)

    Note: This was written in October 2025, but this is an ever-changing space, so please contact your local LEAP support engineers if you would like to discuss your specific needs!

    Working at the bleeding-edge of engineering simulations, LEAP’s engineers are frequently asked for advice from our clients across many different industries, who are looking to successfully navigate the balance between cost and optimal solver performance when purchasing new hardware. The following advice is primarily based on LEAP’s accumulated recent experience with a particular focus on CFD workloads (Ansys Fluent), as FVM codes are capable of scaling incredibly well if the system is well designed. Other tools such as Ansys Mechanical or Electromagnetics have very similar requirements, with a few notable differences/exceptions:

    • FEA codes generally do not scale as well as FVM, so do not expect to see the same degree of speedup as shown in some of the graphs below.
    • Mech and EM both perform a large quantity of I/O during the solve, thus it is imperative that the system has a very fast storage device (i.e. NVMe SSD) used as the scratch/project directory.
    • Ansys EM (HFSS and Maxwell 3D) often require significantly more memory capacity than CFD.

    General high-level advice to optimise your solver performance:

    • Memory bandwidth is the key performance specification in most situations. Populate every memory channel available in your system, and try not to exceed a ratio of ~4:1 cores per memory channel (or ~2:1 if looking for absolute maximum performance)
    • DDR5 memory is imperative – try to aim for high clock speeds (5600 MT/s +).
    • Look for CPUs with larger cache pools – this increases the effective memory bandwidth
    • High base CPU clock-speeds are important – do not buy low wattage CPUs with high core-counts.
    • Ansys solvers should always be run on physical cores rather than individual threads (we often advise turning off multi-threading on dedicated workstations / servers).
    • Ensure your working directories and scratch locations are on high-speed NVMe storage devices.
    • Ensure you have a discrete GPU for pre/post processing (except for headless solver nodes).
    Small systems – laptop/desktop for solving on 4 cores:

    Here our goal is to obtain the maximum possible performance on 4 cores.

    Primary considerations for a laptop:

    • Choose a chassis with ample cooling (sorry, no thin/light ultrabook style laptops here…).
    • Look for CPUs with a suffix beginning with “H” – these are the high-power versions, typically with TDPs of 35 W or higher. Note that this is no longer a blanket rule, as some newer AMD laptop CPUs are branded “AI” or “AI Max” and no longer carry the H or U suffix.
    • Ensure that the CPU has at least 4 “performance” cores – we do not want to run simulations on “efficiency” cores.
    • There is no benefit to super high core-counts. We do not recommend using HPC packs on laptops due to memory bandwidth constraints (most only have 2 memory channels). However, high-end CPUs often have increased clock-speeds and cache in addition to high core-counts, so you may find that the “best” laptop CPUs have significantly more cores than you’ll end up solving on.
    • AMD parts from 6000 series onwards support DDR5 – prefer Zen 5 cores, e.g. 9000 series or “Ryzen AI Max” (which also has quad channel memory).
    • Intel parts from 12th generation onwards support DDR5 – prefer “Series 2” e.g. 265HX or 285 HX. (https://www.intel.com/content/dam/support/us/en/documents/processors/core/Intel-Core-Comparsion.xlsx).
    • We recommend a bare minimum of 32 GB of RAM, preferably 64 GB for the ability to launch larger jobs.
    • NVMe storage – at least 1 TB to fit OS, software, and project data.
    • Discrete GPU preferred for display/visualisation purposes (Nvidia preferred – wider support amongst Ansys tools). However, some modern high-end laptop CPUs have suitable built-in graphics – such as the “AMD Ryzen AI Max” range.

    For a gaming/consumer-class 4+ core build:

    • ‘Gaming’ type systems are often quite suitable.
    • High clock speeds (base > 4 GHz, turbo > 5GHz).
    • There is no benefit to super high core-counts. We do not recommend using HPC packs on gaming systems due to memory bandwidth constraints (only 2 memory channels available). However, high-end CPUs often have increased clock-speeds and cache in addition to high core-counts, so you may find that the “best” gaming CPUs have significantly more cores than you’ll end up solving on.
    • DDR5 memory (faster is better – e.g. 6000 MHz), generally 64 GB is suitable – depending on the size of your simulation models.
    • For AMD systems, do not use more than 2 memory modules – the memory controller is not capable of running multiple modules per channel at maximum speed. This will likely limit you to 128 GB (2x 64 GB) when using consumer-grade CPUs
    • NVMe storage – at least 1 TB to fit OS, software, and project data.
    • Discrete GPU for display/visualisation purposes (Nvidia preferred – wider support amongst Ansys tools), e.g. Nvidia 5060 Ti 16 GB, or Nvidia RTX 4000 Ada.

    Both AMD and Intel have suitable options in this category:

    • AMD Ryzen 9000 series (Zen 5).
    • AMD 3D V-cache CPUs (9800X3D, 9950X3D) are particularly suitable as CFD workloads benefit greatly from large cache pools.
    • Intel Core Ultra 7 or 9.
    Small-medium systems – desktop/workstation for ~12 solver cores:

    While it can be tempting to use a ‘gaming’ type system as above, there are a couple of new considerations to make here:

    • It is best to avoid hybrid processor architectures with “performance” and “efficiency” cores – we want our simulations to be running on 12 identical high-performance cores.
    • Memory bandwidth needs to be considered. Gaming/consumer CPUs only have 2 memory channels, which will cause a bottleneck when distributing a simulation across 12 cores (6 cores per channel). We prefer to limit the ratio of cores/channel to ~4:1 (or preferably even ~2:1) as mentioned above.

    For a dedicated workstation-class 12+ core build:

    • AMD Threadripper:
      • Non “Pro” versions suitable as 4 memory channels is sufficient for these relatively low core counts.
      • Prefer 9000X series for Zen 5 cores and higher memory speeds.
    • Intel Xeon:
      • Xeon-W (Sapphire Rapids).
      • Many options, prioritise higher base clock speeds.
    • Make sure to use at least 4 memory channels:
      • i.e. populate the motherboard with 4 individual memory modules.
      • For example, if you want 128 GB of RAM, make sure to use 4x 32 GB DIMMs.
    • NVMe storage – at least 1 TB to fit OS, software, and project data.
    • Consider a second storage device to handle a large quantity of project files.
    • Discrete GPU for display/visualisation purposes (Nvidia preferred – wider support amongst Ansys tools), e.g. Nvidia 5060 Ti 16 GB, or Nvidia RTX 4000 Ada.
    Medium systems – workstations for ~36 solver cores:

    Similar requirements to the workstation-class build in the previous section, with a few modifications.

    For a dedicated workstation-class 36+ core build:

    • AMD Threadripper Pro:
      • Do not use the non “Pro” versions – these only have 4 memory channels.
      • Prefer 9000WX series for Zen 5 cores and higher memory speeds.
    • Intel Xeon:
      • Xeon-W (Sapphire Rapids).
      • Many options, prioritise higher base clock speeds.
    • Can also use server-grade parts (e.g. Xeon 6, AMD Epyc).
    • Make sure to use at least 8 memory channels:
      • i.e. populate the motherboard with 8 individual memory modules.
      • For example, if you want 256 GB of RAM, make sure to use 8x 32 GB modules.
    • NVMe storage – at least 1 TB to fit OS, software, and project data.
    • Consider a second storage device to handle a large quantity of project files.
    • Discrete GPU for display/visualisation purposes (Nvidia preferred – wider support amongst Ansys tools), e.g. Nvidia 5060 Ti 16 GB, or Nvidia RTX 4000 Ada.
    Large systems – workstations / servers / clusters for ~128+ cores:

    Multi-CPU systems requiring server-grade parts. Larger core counts will require clustered nodes with high-speed interconnects. Our goal is to ensure that total system memory bandwidth is sufficient to support the increasing number of cores.

    • General advice:
      • With 12 channels of DDR5, we recommend using up to 48 core parts to maintain good scaling (~4:1 ratio of cores to channels).
      • To maximise price to performance, you may wish to use 2x 64 core CPUs to squeeze the most out of 3 HPC packs without distributing your job across multiple nodes, but please note that the scaling is likely to drop off significantly due to the core to channel ratio exceeding ~4:1.
      • To maximise outright performance, ideal scaling can be obtained by using many smaller CPUs and distributing jobs across multiple nodes – using a core to channel ratio of ~2:1 can maintain near-linear scaling performance from 1 to 100s (or 1000s) of cores. However, this can get very expensive very quickly…!
      • Prioritise models with high base clock speeds and large cache.
      • I’m sure you get the point by now, but make sure to populate every single memory channel available (8 per CPU for mid-range Intel, 12 per CPU for high-end Intel and AMD)
      • In multi-node configurations, make sure to use high-speed interconnects such as InfiniBand.
    • AMD Epyc:
      • 9005 series (5th gen) preferred, although 9004 (4th gen) still suitable
      • Notable performance gains can be obtained with 3D V-Cache “X” series parts, which have additional L3 cache – see dedicated section below.
    • Intel Xeon:
      • Xeon 6 (Granite Rapids) preferred, although Sapphire Rapids still suitable.
    Notes on AMD 3D V-Cache

    Single node performance can be significantly improved – many benchmarks (note: older gen 7003X series shown) infer speedups of 10 to 30%, with one extreme even reaching to 80%:

    Ansys Fluent Single-Node Performance

    Multi-node performance can exhibit super-linear scaling – i.e. the speedup factor can outweigh the number of nodes. As more nodes are added to the system the total cache capacity increases accordingly, thus a higher proportion of the simulation is able to reside in cache rather than DRAM – one example below shows a speedup of 11x for a system consisting of 8 nodes:

    GPU Solver Considerations

    GPUs can offer dramatic speedups for codes that are optimised to run in massively parallel environments. Particle-type solvers such as LBM, DEM, SPH etc. or raytracing solvers such as SBR+ are well suited to GPUs as parallelisation of the code is trivial; however, this is considerably more difficult to achieve with traditional CFD and FEA methods such as FVM and FEM.

    Ansys have recently released a native multi-GPU (FVM) solver for Fluent which currently supports features such as sliding mesh, scale resolved turbulence, conjugate heat transfer, non-stiff reacting flows, and mild compressibility. Benchmarks reveal that one high end GPU (e.g. Nvidia RTX 6000 Pro or A100/H100) can offer similar performance to ~500-1000 CPU cores – thereby offering significant hardware cost savings and enabling users to run large simulations in-house rather than having to rely on external clusters / cloud compute.

    Ansys Rocky is another tool which is suited to GPUs. Rocky is primarily a DEM solver, capable of handling complex solid motion involving millions of particles; but also has an SPH solver for dealing with fluid motion of free-surface flows.

    Please see separate blog for more information on GPU selection for Fluent and Rocky!

    Additional notes on Operating Systems:

    The vast majority of Ansys tools can be run on either Windows or Linux (for a full list of supported platforms, please check the Ansys website: Platform Support and Recommendations | Ansys). We generally recommend performing pre- and post-processing on local Windows machines for a number of reasons:

    • Ansys SpaceClaim and Discovery are currently only available on Windows.
    • Display drivers tend to be more mature/robust on Windows.
    • Input latency can be an issue if trying to manipulate large models over a remote connection to a workstation/server.
    • Pre/post can be a waste of compute resources on a server – it is often desirable to keep as close to 100% solver uptime as possible.

    Windows is also suitable for solving on small to moderately large machines (e.g. ~32 core workstations), however, we generally recommend dedicated Linux servers for larger solver machines and clusters – particularly if a multi-user environment is desired (e.g. queuing systems etc.).

     

    Additional notes on Cloud Computing:

    Stay tuned for our separate blog on Cloud Computing coming online in November 2025!

    Leave a Reply

    Your email address will not be published. Required fields are marked *