CPU Compute

One side I like to exploit on CPUs is the ability to compute and whether a variety of mathematical loads can stress the system in a way that real-world usage might not.  For these benchmarks we are ones developed for testing MP servers and workstation systems back in early 2013, such as grid solvers and Brownian motion code.  Please head over to the first of such reviews where the mathematics and small snippets of code are available.

3D Movement Algorithm Test

The algorithms in 3DPM employ uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc.  The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score.  This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark.  The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance.  Results are expressed in millions of particles moved per second, and a higher number is better.

3D Particle Movement: Single Threaded

3D Particle Movement: Multi-Threaded

N-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity.  Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other.  When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions.  The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code.  We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

N-Body Simulation

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it.  This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step.  By choosing a regular grid, we reduce the levels of memory access required for irregular grids.  We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Explicit Finite Difference Solver (2D)Explicit Finite Difference Solver (3D)

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations.  This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread.  The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing.  For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision.  Again our grid is isotropic with the boundaries acting as sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Implicit Finite Difference Solver (2D)

CPU Real World IGP Compute
Comments Locked

48 Comments

View All Comments

  • jabber - Tuesday, November 19, 2013 - link

    Indeed, or for those of us that found girls, moved out, got older, changed hobbies, just realised that running benchmarks all day is a waste of life or found that actually the world doesn't end if you don't upgrade your PC every 6-12 months.

    There is a need for some sites that analyse how the current $60-$200 GPUs compare with those of 5 years ago, same for CPUs etc. Big market for that kind of info but unfortunately all we get is "this sites for enthusiasts noob!" well thanks but I'm still an enthusiast but now I have a mortgage or I'm only earning half what I was 5 years ago.

    The info I get from Anandtech I can get anywhere......
  • Shadowmaster625 - Tuesday, November 19, 2013 - link

    In the conclusion you should add one bar to one of your charts... a bar where the RAM is at 1600 but the cpu is clocked just 100MHz higher, to really highlight how little impact memory speeds have on performance compared to even a tiny cpu clock speed boost.
  • jabber - Tuesday, November 19, 2013 - link

    Buy whatever matches best with your motherboard and GPU colour scheme I say.
  • ShieTar - Wednesday, November 20, 2013 - link

    Of course. And never overclock your memory when there is a full moon.
  • D1RTYD1Z619 - Tuesday, November 26, 2013 - link

    or if you have pickles in your fridge.
  • rmh26 - Tuesday, November 19, 2013 - link

    Ian can you give a little more information about the size of your CPU computer benchmarks specifically the grid size on the finite difference problems. In my experience memory bandwidth plays a large role in the speed of the computation. There are many HPC applications that have memory as the bottleneck and I'm wondering if your problem size is small enough that it is being efficiently handled by the cache and the ram speed isn't making much of an influence. In know in my own CFD code going from 1600 to 1866 showed an almost linear speed up.
  • UltraWide - Tuesday, November 19, 2013 - link

    Did you have a chance to remove the heat-spreaders to see which ICs are in these? I am assuming Hynix MFR?
  • Gen-An - Tuesday, November 19, 2013 - link

    Hynix DFR actually. 2Gbit ICs (256MB each) so the same size as CFR but with nowhere near the overclockability.

Log in

Don't have an account? Sign up now