Cell BE assignment

Last update: 29-11-2008

Table of contents

  1. Introduction
  2. Getting started
  3. Programming the Cell BE
  4. The assignment
  5. Alternative installation
  6. Contact

1. Introduction

Welcome to the homepage of the Cell BE (CELL Broadband Engine) assignment for the course 5KK70 / Platform-based design. The purpose of this assignment is to get familiar with programming the CELL architecture, which is a highly parallel archicture supporting both task-level and data-level parallelism. This architecture contains one PowerPC RISC and 8 SPEs (Synergistic Processing Element). Both the PowerPC and the SPEs can exploit sub-word parallelism (SIMD). The SPEs have a 128-bit wide data path, which can be divided in e.g. 4 words. All processors are connected by a high-speed multi-ring type of Network-on-Chip. Programming this architecture is extremely challenging, but if you do it right, you can get extreme performance, more than 100 MFlop.

This page contains instructions on how to get started, the description of the assignment and links to pages with information about developing for the Cell BE.


2. Getting started


The development toolkit for the CELL BE is only available for Linux, so you either have to setup a dual-boot system for your PC or use the VM Ware image provided on this site. The easiest way to develop for the Cell BE under Windows is to use VM Ware.  The image provided has version 2.1 of the SDK already installed so you can get started straight ahead. Those of you who don't want to use the method described here for some reason can find alternative instructions here.

Getting the files

Download the following two files:

Unzipping and installing

Install VM Ware using the provided installer and unzip the zipfile into a directory. The unzipping can take quite some time so be patient, grab a cup of coffee or something.

Some people seem to have troubles unzipping the file. I don't know exactly what the problem is. If you experience troubles and don't want to download the file over again you could first check its md5 sum using this program: http://etree.org/cgi-bin/counter.cgi/software/md5sum.exe. If it's not ffa7abfc56be1155e09a5b564fa2ebb3 your file is corrupted and there is no other sollution then re-download the file. If the checksum is correct try to extract the file with a different program. I've had good experiences with the latest version of WinRar which can be found here: http://www.rarlabs.com.


Start VM Ware Player from the Start Menu and browse to the directory where you unzipped the zipfile. Open the VM Ware configuration file (fedora-fc6-i386.vmx) and wait for the system to boot.

When asked for a username, type root. The password is inn0vate.


If you prefer working in an IDE (like Borland C++ Builder or Microsoft Visual Studio) you should install Eclipse as this environment has a plugin for the Cell BE available. Installing Eclipse is very easy. Just go to Applications -> Add/Remove Software. Select Development on the left and check Eclipse. The just hit apply and the software is downloaded and installed for you.

After the installation has finished fire up Eclipse by going to Applications -> Programming -> Eclipse. The default workspace is fine so check the box saying not to ask again and click ok.

To install the plugin for the Cell BE, go to Help -> Software Updates -> Find and Install. Select 'Search for new features to install' and click Next. Click 'New Local Site...'. Go to /opt/ide/com.ibm.celldt.update and click OK. Click OK once more. Click Finish. Check the feature (ide/com.ibm.celldt.update) and click Next. Accept the license agreement. Click Finish. When Eclipse says the plugin is unsigned just say 'Install All'. Answere Yes when you're asked to restart the Eclipse Platform.


3. Programming the Cell BE

The article found here is mandatory for the exam and gives a lot of information about the Cell BE.

To get started, try out the 'euler' example on your local disc at /opt/ibm/cell-sdk/prototype/src/samples/tutorial/euler. Information about how to do the example can be found in this presentation.

There are a few good article about programming high-performance applications for the Cell BE at IBM's website. You can find them here. You can skip the article about installing Linux on the Playstation 3 as we've already done that for you. Pay special attention to techniques used to speed up your code like branch hints, loop unrolling etc.. This might just give you the advantage to write the fastest program!

On IBM's website you can also find a list with 25 tips to optimal application performance.

There also is an online course about the Cell BE processor here

For more examples, check /opt/ibm/cell-sdk/prototype/src/samples/.

Running the program

To run the program you'll either need a system with a Cell BE processor (e.g. the Playstation 3) or use a simulator. When debugging your program or trying the examples, the simulater is the easiest option. There are two ways of running your program on the simulator. I'll explain both in short here.

Using the simulator from the command prompt

  1. Start the simulator by typing /opt/ibm/systemsim-cell/run/cell/run_gui into the command prompt. (Click the icon with a screen showing >_ in the menu bar at the top.)
  2. Switch to 'Fast mode' by clicking Mode > Fast mode. Exit the pop-up by clicking the X in the top right corner.
  3. Start the simulator by hitting Go. Wait until the Linux kernel has fully booted. The window titled mysim will show the message '[root@(none) ~]# '.
  4. Now you'll have to copy the executable from your filesystem to the simulator. Suppose your executable is located at /root/dir/executable you do that by typing the following command into the window titled mysim:
    callthru source /root/dir/executable > executable
    After you've done this you have to give the file execute rights by executing the following command:
    chmod a+x executable
  5. Now you're ready to execute the program. Do this by typing ./executable into the command prompt.
  6. If you need to copy files back to your own filesystem after the program has finished (e.g. dump.bmp) you can also use the callthru command:
    callthru sink /root/dir/dump.bmp < dump.bmp
    (notice the different direction of < and >)
  7. Don't forget to stop the simulator if you're not using it by clicking Stop. Otherwise you're computer will become quite slow.

Using the interface provided by Eclipse

If you've installed the Cell BE plugin for Eclipse you can also let Eclipse do most of the work explained above. Once you've followed the steps to configure everything you only have to hit the green run button to execute your program.

  1. Start up Eclipse.
  2. Create a new simulator by clicking the Cell Environments tab in the bottom of the Eclipse window. Right click Local Cell Simulator (SDK 2.1) en select Create. Enter a target name (e.g. localsim) and hit Finish.
  3. Start up the simulator by clicking the + in front of Local Cell Simulator (SDK 2.1). Click on the name of your simulator (e.g. localsim) and click the green play button. (you can find this button right of the tabs)
  4. Create a new run configuration by clicking the black down arrow next to the run button in the menu bar. (the green circle with the white arrow pointing right). Click Run.... Click C/C++ Cell Target Application and hit the new button (white paper with a yellow plus in the top right corner). Usually Eclipse will fill in the right parameters for project and C/C++ Application. If it doesn't enter the right parameters here. Click the target tab and select the name of your simulator as target. Be sure that the simulator is running otherwise the run button will be grayed out. As last step you have to select the appropriate debugger. Go to the debugger tab and select Cell BE gdbserver gdb/mi.
  5. Click run. If everything is configured right, Eclipse will copy your executable to the simulator, execute it and show the output in it's own console.

Summary of articles


4. The assignment

The assignment consists of writing a C/C++ program that can display 16 video streams at once. Creating a Mosaic TV-channel as seen here on the right. The input streams are Motion JPEG encoded AVI's with a resolution of 624x352. The target output is a Full-HD screen with a resolution of 1900x1080. So the task of the program is to decode and downscale 16 JPEG images. It should do this preferably at 25 FPS; in other words: within 40ms. Both performance and picture quality is considered for grading, so you might want to consider a computational intense downscaling algorithm like Bicubic. To achieve a really high grade you can add some features of your own.

The tarfile with 16 streams can be found here. These files need to be added to the simulator environment. To do that you need to type the following commands:

mkdir /mnt/cell
mount -t auto -o loop /opt/ibm/systemsim-cell/images/cell/sysroot_disk /mnt/cell
tar -zxf streams.tar.gz /mnt/cell/

To get you going with this assignment I've setup a basic framework. This application doesn't do anything except opening the AVI files and setting up an output buffer. It saves you some time finding out how to read AVI files. The framework further contains C code for JPEG decoding which is adjusted to run on a SPE and an empty skelleton for a downscaling program which can run on a SPE. If you're planning on implementing a "heavy" downscaling algorithm and want to offload this to an SPE, this skelleton is a good starting point and saves you time adjusting the Makefile etc..

The next step you should take is give out tasks to the SPE's, let them decode and scale an image and send the results back to the PPE. If you don't know where to start I've made a getting started guide. Following these steps will result in a program that produces the requested output, but unfortunately also has bad performance and low picture quality. Don't feel obliged to structure your program as suggested in this guide! There may be far better ways to distribute the computations.

For debugging purpose, the program now renders the screen ten times to get a good average of the number of FPS and then outputs the last rendered screen to a bitmap file named dump.bmp.

Hand in your assignment by mailing your source code and a small report to k.l.b.hoogendoorn at student dot tue dot nl. In your report you should discus the final program and why you've made certain decisions.

Summary of required files


5. Alternative installation

Here are the instructions to getting everything up and running for those of you who already have Linux installed or prefer not to work with VM Ware. I'm not going to explain how to install a fresh copy of Linux. So if you don't know how to do that, ask a friend or stick with the provided VM Ware image. The instructions below are obtained using Fedora Core 6 but should be roughly the same for any Linux flavour.

First of all you need to download the software development kit (SDK) for the Cell BE. For your convenience we have mirrored version 2.1 here. At IBM’s website you can also find version 3.0 but I don’t have any experience with that. Version 2.1 should be more than enough for this assignment.

After you’ve obtained the SDK, follow the steps in the installation guide. Skip straight ahead to chapter 2 as chapter 1 only contains instructions on how to install Fedora Core 6. You can also skip the step about obtaining the SDK as you’ve just done that.

You don’t have to build the examples as this takes quite some time and you can always build them later on. It’s also not necessary to update the whatis database as described in the installation guide.

I’d recommend that you use Eclipse as your programming environment as it makes life a lot easier. If you do so, also install the Cell plugin for Eclipse as described in the installation guide. If you don’t have Eclipse installed and you’re running Fedora, just type the following as root at the command prompt:

yum install eclipse

This will install Eclipse for you as well as all the required dependencies.

Summary of required files

6. Contact

If you have any further questions, don't hesitate to contact me: Kris Hoogendoorn, k.l.b.hoogendoorn at student dot tue dot nl