Monday, March 10, 2008

Project Plan

Project Plan
for:
A Multi-threaded wave simulation for oceans using the Playstation 3’s Cell Processor
by
M.Stowell

Supervised by A.Williams

March 2008

Aim

This project is an investigation into an optimal technique for simulating ocean waves using IBM’s cell processor on the Playstation 3 and analysing its performance using a range of simulated workloads whilst comparing the results to that of Intel’s Core 2 Duo processor range.

By doing this I hope to achieve a relatively fair comparison of how the two different architectures perform under this type of simulation, and will also serve as a study into methods of carrying out work efficiently on the Cell processor.

Background

With ever increasing developments in technology / computer hardware, and a much greater demand for aesthetically pleasing totally immersive environments in computer graphics and virtual reality, it becomes both necessary, and with current hardware, more and more possible for these applications to use accurate simulations of real world phenomena.

One approach to simulating water talked about in Game Programming Gems - Gomez 2000, describes a method of simulating water in 3 dimensions using only the data for a 2 dimensional simulation. The water is represented as a 2 dimensional plane thought of as a stretched elastic membrane, with the height of the specific vertices being calculated using a partial differential equation. This is considered to be a fairly accurate way to represent waves in computer graphics, but it has a number of downsides with regards to performance and memory.  First of all it uses the previous wave position and the current wave position in order to effectively interpolate to the next position, this uses a lot of memory as is it essentially storing two meshes. It also requires that each vertex has access to detailed information about neighbouring vertices in order to calculate the new position. I believe this will create massive problems when it comes to multi-threading the simulation as it will seriously restrict the ability to split the program up into multiple threads of execution.

Another approach detailed in GPU Gems - Finch 2004 is to use the graphics hardware shading capabilities of modern GPU’s (not so dissimilar to the SPE’s in the Cell processor) to apply a “sum of sine’s” approach to the vertices in order to approximate their positions, this will easily allow for multiple waves / ripples. This method appears to be much more flexible than the previous as it uses a parametrised approach as opposed to basing position on forces from neighbouring vertices, it will allow for the workload to be easily divided up into threads, gives precise control over the geometry, and allows for easy scalability.  As previously mentioned the work described here is carried out using GPU specific hardware, I plan to rework this into an efficient CPU based algorithm that is especially optimised for the Cell processor.

Unfortunately due to the Cell processor being a relatively new architecture, not too much work has been publicly released, particularly on the subjects of rendering and physics simulation. Insomniac Games recently released a document from one of their presentations at this year’s (2008) Game Developers Conference. Spu Shaders – Acton 2008 It lists a number of methods for using the Spu’s to do the work of a GPU’s shader. Some of the main key aspects I took from studying this where that it is crucially important to concentrate on the layout of data so as it is completely modular, straightforward, and uses as much of each SPE as possible, due to the nature of bandwidth and memory constraints it appears that the SPE’s can process data a lot quicker than they can receive it, so with regards to parallel optimisations following these rules should help me to develop a very efficient algorithm for this study.

Specification

The end product is to be two applications built from the same code base / framework that are capable of running on x86 and Cell architecture. It should graphically represent the wave simulation and include some method of adjusting the scale of the simulation and all relevant parameters in order to accurately benchmark performance. The x86 version should be multi threaded in such a way that the number of cores used can be scaled, so for a single threaded machine it is possible to run just one thread, but this must be scalable up to at least 4 threads in order to benchmark the increase in performance. The Cell version of this should perform the simulation using only the SPE’s (starting with 1 SPE scalable up to 6) with the PPE in charge of rendering the output of the simulation and controlling the flow of data / programmes between memory and each of the SPE’s. In order to correctly assess fair results the x86 should be benchmarked using both the CPU to do the drawing routines and the GPU for each test.

Due to the grid like method I am using to represent the wave surface a main scaling factor can be how many segments of the grid there are, increasing the amount of segments effectively increases the level of detail and also the processing power required. Starting values for my initial tests will be 80 for low, 120 for medium and 160 for high, although depending on the performance of the machines I may increase / decrease in order to get usable results.

Another aspect of GPU Gems - Finch 2004 I liked was the “sum of sine’s” approach that he uses, this can easily be translated into having n number of waves active at any time all being able to affect each other as they would in reality, which should also be scalable from 0 to 4, this should provide a decent amount of data for testing purposes.

The parametrised aspects of the wave data should be fully represented and editable either at run time through a GUI or at lest be easily modifiable via text file to provide quick testing capabilities.

Strategy

In order to effectively implement everything that the specification outlines I must break this down into individual sections.

First of all it is necessary to create a generic framework for rendering. This will require a vertex buffer stored at memory level so that the x86 / Cell has easy access to the data. This should be created using the Irrlicht rendering engine. And should be able to render a grid of triangles with scalable dimensions. This is all that is required from the render as the wave simulation code will handle all of the vertex transforms.

Then just using the PPE of the Cell and one thread of the x86 develop the methods required to carry out the simulation. This will obviously be terribly slow and pretty useless with regards to obtaining results for this simulation. But this will ensure that the starting framework is the same for both platforms.

With all of this working correctly all necessary parameters should be exposed to some kind of interface whether it be a graphical interface or simply using data loaded from text file / entered from a command line.

After this it will be necessary to multi thread the application for the Core 2 Duo splitting the workload into chunks suitably sized to make it execute as fast as possible. The same needs to be done with the Cell processor i.e. splitting up the data workload as effectively as possible in order to create the fastest method for processing this simulation on the SPE’s.

Schedule


CLICK TO ENLARGE

Experiment with the Irrlicht rendering engine looking for the optimum set-up across the two platforms and try to define the best possible way to represent the data.

Test out different methods of manipulating the vertices in software, in order to determine an efficient way of directly modifying the vertex buffer.

Use previously obtained method to attempt first basic calculations to manipulate a single wave moving along a pre set sine wave as a test of what kind of scale / values I’m working with, and to give me an idea how well the buffer manipulation works.

Begin to create a basic wave class containing a lot of the required parameters such as Wave Length, Amplitude, Speed, and Direction.

Concentrate on using the appropriate mathematical terms to correctly create a single wave supporting change in the parameters.

Concentrate on using the appropriate mathematical terms to calculate the surface normal and tangents for the waves at any point. This will be useful for texturing / lighting and also for interacting with any objects outside of the simulation (e.g a Boat object).

Implement the Gerstner wave function in order to add steepness to waves.

Work on the ability to include multiple waves (up to 4).

Create some test scenes using different pre-sets for a number of waves / situations in order to get an idea of what performance to expect and catch any bugs that may occur.

Spend time studying Libspe2 in order to have a fairly good grasp of its functionality before I attempt optimisation

Use OpenMP to split the x86 workload into threads.

Use Libspe2 to create a number of SPE programs that do different aspects of the wave simulation, also create a SPE program capable of carrying out the whole simulation process on a set of data.

Use the PPE to split the Cell workload into threads and distribute amongst some of the SPE programs in order to see what works well.

Research current published resources for methods of optimising for the cell, spend time speaking to other developers to gain insight.

Optimise SPE routines using a number of documents published by IBM and other various other sources as a guide.

Complete prototype report.

Implement all tests on both platforms carefully recording results.

Compile all data and try to determine if the test was successful.

If data seems to have inaccuracies spend time determining if any of the methods used can be modified in order to produce better results.

Re-test and compile data.

Integrate data into Prototype Report.

Finish report.

Posted by Mike on 03/10 at 07:31 PM
PS3 / Cell • (12) CommentsPermalink
Page 1 of 1 pages