Librenix
Headlines | Linux | Apps | Coding | BSD | Admin | News
Information for Linux System Administration 

Space Tyrant: Multithreading lessons learned on SMP hardware

Up
vote
Down

There is much to report in this update of Space Tyrant. Before getting into the new features and functions, I’ll dispense with the crisis of The Bug.

For a couple of weeks, we had been noticing odd anomalies with Space Tyrant (ST) running on the virtual server at Ioresort.com (now offline -Ed.). We never saw the problem on any other box -- and it was tested on at least four other Linux boxes and a Mac OS X system. We did all manner of stress testing, locally and over the Internet, script based and even feeding the game the output of /dev/random. Nothing caused the anomaly on any other box.

At first, I suspected that it might just be an obscure problem with the virtual server itself; after all, I had been forced to modify the TLR code to get it to run properly there. That problem turned out to be merely a limitation of NFS, not a bug with the virtual server software. However, the environment was clearly different from any other system I had used which raised my suspicions -- and reduced my urgency about looking for the bug.

While the bug wasn’t frequent, it was persistent. The bug appeared to be related to corrupted buffers or corrupted buffer indexes. Out of idle curiosity, I lowered the number of buffers used by ST to see if that affected the bug. Somewhat counter-intuitively, it substantially raised the frequency of the problem.

Brian Estabrooks (the hero of this release) and I spent more and more of our efforts hunting this incredibly elusive bug until that was all we were doing. I implemented various diagnostic routines hunting for clues. The all seemed to point to buffer indexes being changed incorrectly. Both Brian and I audited the code. It seemed impossible for the indexes to be changed improperly. Brian even went so far as to replace the ring buffer scheme with a high watermark approach but to no avail.

While I continued to suspect it to be a simple logic error in the code, Brian turned his efforts elsewhere. What he came up with was quite interesting. It seems that on many hardware architectures (most? all?), modifying a bit field can temporarily modify other bit fields in the same word! Now, this isn’t a problem on a single-CPU system; it repairs the damage in the same operation, making it, effectively, atomic. On an SMP machine, however, two different CPU’s working on different bit fields of the same word simultaneously create havoc. The operation isn’t really atomic and it doesn’t work.

Did I mention that the virtual server is a 4-way Xeon system?

The ring buffer indexing in ST relies on unsigned integer bit fields to automate wrapping back around to the first buffer after using the last one. My parsimonious programming, of course, packed all the bit fields together, several to a word. Brian’s test version of ST added a pad after each buffer index to round it out so that each bit field lived alone in its own complete word. We abused the new version for nearly an hour before either of us would dare say it. The bug was gone.

Yay!

So, the moral of this story is: Operations on sub-word fields affect other bits in that word (at least on many hardware architectures). Tread very carefully if multiple threads are accessing different bits in shared words. It may appear to work perfectly, only to crumble into a pile of smoldering rubble the first time it's loaded on a multiple CPU system!

Other than the primary lesson, some other good things came out of (the search for) the bug. Several other latent bugs were found and fixed and Brian and I are both much more intimate with the code.

And, on to the enhancements. ST is starting to look like an actual playable game. The following functions implement the new major features.

players(): We now have player rankings. It works by adding all the players’ ship resources to an integer array. Then it scans the universe looking for deployed fighters and adds those to the array as well. Currently, those two items comprise the total strength of a player.

It then sorts the array with a recursive bit-plane sort that I wrote for Starship Traders in 1998. The qsort() function in the C library was plenty fast, but took too much memory for my taste. Memory was a bit scarcer in those days and, worse, the SST software model gave each player his own copy of the server.

The sort reorders the array in place as follows. It scans the high-order bit in each element of the array. It then moves all elements starting with ‘1’ bits to the top and all starting with ‘0’ bits to the bottom. Next, it calls itself twice to reorder the first and second chunks of the array on the second bit. Each of those two instances of the sort then call the sort twice again, now giving 4 new sorts for the third bit, and so on. When all 32 bits are accounted for, the array is in the correct order with the top player on top, etc.

Scanning the entire universe can be expensive with a large map. Therefore, the player rankings function keeps the result and time stamps it. If another player asks for a player ranking within five seconds, the system just gives them the old one. After five seconds, however, any new request triggers a fresh listing.

autopilot(): We’ve added an autopilot to let a player find a specific sector -- or to locate the nearest planet. If you type a ‘0’ (zero), you’ll be prompted for a sector number within 1000 of the sector you’re currently in. You then will have the option of pressing ‘/’ to automatically warp to the destination sector.

If you’re looking for a planet, type the ‘L’ command that you would normally use to land on a planet in your sector. In the absence of a planet, the L key will engage the autopilot which will search for the nearest planet and give you a ‘/’ command to autowarp to it.

The new autopilot function consists of two other functions in addition to autopilot(), which is merely a control function. I had intended to use the old shortest path algorithm function from TLR but it was big and complicated. I decided to try to write a simpler, recursive shortest path algorithm instead. The new recursive function is much simpler but not quite as efficient as the giant for loop in TLR.

The actual algorithm is implemented in two functions called pathdepth() and pathcalc(). The pathdepth() function repeatedly calls pathcalc() with an increasing ‘depth’ parameter. ‘Depth’ tells pathcalc() how many levels deep to search before giving up.

The pathcalc() function simply looks to see if the sector it is looking at is the target sector. If not, it calls itself for each new sector that the current sector connects to. If the current sector is the target sector, it starts filling in an array for the autowarp() function to follow to reach the target sector. As the previous recursive calls to the pathcalc() function exit, they fill in the remainder of the path array.

And, yes, I seem to like reinventing the wheel. ;-)

The other interesting addition to the code is the backup thread. It is implemented by a function called backupdata() and works as follows: It scans the player data, the map data, and the history data looking for ‘dirty’ flags. (Whenever any persistent data is changed anywhere in the game, a dirty flag is set to tell the backup thread to write it out to disk.) This process is quite fast for a small game, but for a game with millions of sectors, it’s a significant waste of resources to scan the dirty flag array frequently.

Therefore, for the map and history data, I’ve implemented a ‘dirty block’ scheme as well. When a dirty flag is set, its corresponding dirty block flag is set too. Then, the backup thread need only scan the dirty block arrays, typically only about one percent the size of the arrays it represents. When a dirty block is found, only the hundred or so records it points to are scanned to find the actual dirty records for backup.

The backup file, named ‘st.9999.dat’ -- where ‘9999’ varies with the port number you run the game on -- goes into the current working directory from where you start the daemon. If the file doesn’t exist, a new game is started. Also, if you’ve modified the game in a way that changes the size of the data -- by increasing the map size, for example -- it will start a new game upon startup.

The game can be shut down from the command line by sending a signal 15 (kill -15 pid) or by the admin with the ^ command. Note that the first player to create an account in a new game automatically becomes the admin of the game!

makehistory(): The storing of historical data is new as well. Whenever another player attacks your ship while you’re logged off, you’ll get a report of the action and any losses when you next log on. Also, for remote deployed fighters, you never get immediate notification, so that information is stored in the history log even if you're logged on when it happens. You can view any accumulated event information since your login time by pressing the ‘e’ key.

deploy(): This simple function allows a player to deploy, or retrieve, guard fighters in a sector. Those fighters will not let another player pass through or view any of the contents of that sector. Any ships parked under the fighters are automatically protected against all attacks except for an attack by the fighters’ owner. Once the fighters are destroyed, of course, all ships there are visible and can be attacked.

There is also a newly implemented time limit in the game to limit the total online time of a day’s sessions to 4 hours. Like most other parameters, it can be changed by modifying a #define statement near the top of the code.

command(): The help page, a menu of available commands that a player can perform, has been redesigned and rewritten. This menu is attached to the '?' key.

The old debugger thread is gone, replaced by an in-game command function called showdata(). Press the ‘z’ key to see information on buffers, buffer indexes, and the backup thread’s state and history. Only if you’re serious about modifying the code will this information be useful.

The section of the gameloop thread that broadcasts radio and news messages has been modified to show only one of each type of message per pass. That way, replaying a long radio history won’t flood the output buffers and longer radio and news histories can therefore be retained.

The old jumprtn() movement function has been consolidated into the warprtn() function. It’s only slightly more complicated than having them separate.

The current source code can be downloaded from http://librenix.com/st/st.158.c. and the original article in this series is here. As usual, the compile script is embedded in the comments at the top of the source file. You’ll have to rename the source st.c for the script to work unchanged.

[A Space Tyrant home page has been created as a central index to the various ST articles, links, and files.]
mail this link | permapage | score:9154 | -Ray, June 26, 2005 (Updated: July 26, 2008)

Linux hardware review: Pentium III SMP mainboards

Up
vote
Down

The MSI 694D-Pro 2 motherboard vs. the Acorp 6A815EPD1 running Linux:
Not everyone can afford a top of the line SMP system, such as the Athlon XP 1800 rig we reviewed recently. If one wanted to venture into the land of the unsupported, the Duron or Celeron would be a possibility. But we wanted to see what could be done with a Pentium 3, which is the only officially supported SMP option that would fit into the budget realm. Read on for a detailed look at a Pentium 3 933 SMP setup, on two different chipsets. This will mainly be a motherboard review, but will have some bits of our CPU review included too.
read more...
mail this link | permapage | score:7968 | -Ray, November 13, 2001 (Updated: February 3, 2003)

Mainstream SMP Systems

Up
vote
Down

A comprehensive look at low-end to mid-range multiprocessor systems.
Mostly to help reduce the scope of the article, the main focus is on CPUs and systems designed to be at least reasonably cost effective and for decent volume. This certainly includes AMD, Apple (with PowerPCs) and Intel, who all have reasonably low-cost high volume desktop CPUs that are also used in multi-processor workstations and servers. Intel also have some CPU designs for multi-processor systems only, and AMD is planning to follow. UltraSPARC systems from Sun Microsystems have the highest volume of 64-bit systems and their recent 8-way systems compete well against similar Intel-based systems, and have some interesting lower-end designs in the pipeline.
read more...
mail this link | permapage | score:7272 | -Ray, May 30, 2002

SMPlayer Linux video, audio player

Up
vote
Down

Play online videos, DVD's, audio CD's, and MP3's on your Linux system with SMPlayer...
One of the most touted features of SMPlayer is that it can resume playback at the exact point you stopped it, even after you restart the program. Another original feature is the ability to define a playlist, so SMPlayer can show a list of videos one after another. You can reorder items if you like, and even shuffle them at random. A "repeat" feature allows for viewing loops.
read more...
permapage | score:6563 | -Ray, September 17, 2008

Hardware: Volume SMP Systems, part 2

Up
vote
Down

Chris Rijk goes a bit deeper in his analysis of volume SMP offerings.
For this part, we?ll be looking at the specific architectural implementations of several volume multi-processor systems, including those based around the Pentium III, Pentium 4, Athlon, PowerPC, Itanium, and UltraSPARC architectures. Additionally, we?ll also investigate into many developments in store for 2003, including integrated northbridges/memory controllers and improvements in thread-level parallelism (TLP) through on-chip multiprocessing (CMP) and fine-grained multithreading.
read more...
mail this link | permapage | score:6518 | -Ray, June 26, 2002

SMPlayer Review: MPlayer Frontend

Up
vote
Down

SMPlayer intends to be a complete front-end for MPlayer, from basic features like playing videos, DVDs, and VCDs to more advanced features like support for MPlayer filters and more.
MPlayer is a movie and animation player that supports a wide range of codecs and file formats, including MPEG 1/2/4,DivX 3/4/5, Windows Media 7/8/9, RealAudio/Video up to 9, Quicktime 5/6, and Vivo 1/2.
read more...
permapage | score:6423 | -gg234, June 13, 2007

Storage, SMP much better under Linux 2.4 kernel

Up
vote
Down

And a snapshot of the state of Linux storage. [Large .pdf file]
...storage manageability, capacity increases, and SMP (symmetric multiprocing) are the key areas where the new kernel takes dramatic strides beyond its 2.2 predecessor...

[ . . . ]

...we the resized the file system upward using Linux's resize2fs utility. We did have to unmount the file system to resize it; a project called ex2resize offers online file system growth, but he project's authors warn it is not ready to be used with vital data until it gains journaling support...
read more...
mail this link | permapage | score:6413 | -Ray, February 4, 2001 (Updated: August 23, 2003)

The future of volume SMP systems

Up
vote
Down

Chris Rijk takes a look into the crystal ball and sees a glimpse of the future of multi-processor systems.
This time around, we'll take a look into the future of volume multi-processor systems (or the very immediate present in the case of the recently introduced Itanium 2). Of course, we'll look at the significantly improved SPECfp2000-leading Itanium 2, Sun's 'low-end' UltraSPARC IIIi with on-board 128-bit DDR SDRAM memory controller and 1 MB L2 cache, and the first 64-bit x86 processor: AMD's Opteron. I'll also be discussing some more general trends of the market as well as making some observations of customer and vendor requirements alike. Finally, there will be an overview of where all the major CPU architectures and companies are at the moment and where they're going.
read more...
mail this link | permapage | score:5801 | -Ray, July 11, 2002

SMP hardware review: E7505 boards Linux tested

Up
vote
Down

In this test of upscale dual processor motherboards, the Iwill DP533, the MSI E7505 Master-LS2, and the Supermicro X5DA8 are tested.
Finally, we have one last major feature, the memory. All three boards have different memory requirements. On the Iwill board, only unbuffered memory will work and there are four memory slots for up to 8GB of RAM. On the MSI board, both unbuffered and registered memory is accepted (although we aren't sure how they rigged this). Officially only registered memory is supported but, as you'll see in the benchmarks, both work just fine. Once again four slots support up to 8GB of RAM. Finally, the Supermicro board supports only registered memory but they threw on six slots for a maximum of 12GB of RAM. This will be a big turn-on for those who see this as extending the life of the board through one extra upgrade.
read more...
mail this link | permapage | score:5677 | -Ray, April 12, 2003

Linux clusters on SMP hardware

Up
vote
Down

The twelve-page paper is available in HTML, PDF, and Postscript formats.
The Linux kernel development community is currently pondering whether to follow the path of extreme threading or try an entirely different approach. One such approach, which has been promoted by Larry McVoy and which is substantiated by the existing body of operating system research, is the implementation of a cluster of kernels on SMP hardware. This paper presents a suggested architecture for the implementation of such a cluster.

The architecture presented here is built around 5 components: the Adeos nanokernel, the Linux kernel, a set of virtual devices, a kernel-mode bootloader, and existing clustering solutions.
read more...
mail this link | permapage | score:5084 | -Ray, July 25, 2002 (Updated: February 8, 2003)

Pentium III SMP Xeons faster than P4 version

Up
vote
Down

I don't think the clock-speed-is-everything approach will work as well in the server world as it works in the home market.
PC manufacturers are upset because Pentium 4 Xeons in a four way symmetric multiprocessing (SMP) configuration are thoroughly thrashed by a chip that Intel wants to consign to the microprocessor gulag.

The Pentium III Xeon 900MHz with 2MB of cache - the so called Cascades processor - easily thrashes a Pentium 4 in a four way configuration but Intel wants the manufacturers to go with the current Xeon.
read more...
mail this link | permapage | score:4939 | -Ray, May 27, 2002

Linux and SMP Systems

Up
vote
Down

This article explores the ideas behind multiprocessing and developing applications for Linux that exploit SMP. As evidenced by major CPU vendors, multi-core processors are poised to dominate the desktop and embedded space. With multiprocessing comes greater performance but also new problems. read more...
permapage | score:4812 | -Ida Momtaheni, March 18, 2007

AMD Athlon Multi Processor (SMP) Under Linux

Up
vote
Down

Two processors are better than one.
[In the Linux kernel compile test], we can definitely see where AMD's superior FPU and number crunching power come into play. It completely obliterates its respective Pentium III counterpart and also manages to best the Pentium IV by a good 10 seconds even though it is outmatched by 500MHz. The moral of the story here is... if you will be compiling lots of software and that is your primary concern for a Linux box, get a dual AMD setup to do it.
read more...
mail this link | permapage | score:4682 | -Ray, July 13, 2001 (Updated: November 16, 2003)

Conceptual overview of Threading and SMP

Up
vote
Down

Concurrency -- multi-processing -- is widely misunderstood. This article introduces the basic concurrency concepts you need to conduct your business in the server closets safely. Concurrency labels situations where more than one "application" is running at a time. Linux hosts always fill their process tables with a bunch of more-or-less simultaneous programs: network protocol daemons, cron managers, the kernel itself, and often much more. Linux is a multi-tasking operating system and it's built for this. read more...
mail this link | permapage | score:3977 | -Debinay, December 14, 2002

Xen gets SMP server slicer

Up
vote
Down

Xen is scaling up...
Similar to VMware, XenSource has the likes of IBM, HP, AMD and Intel behind it. Unlike VMware, however, the open source nature of Xen has made it possible for the package quickly to stretch beyond its x86 roots and past basic one, two and four processor servers all the way up to massive SMP machines.
read more...
permapage | score:3844 | -Ray, July 11, 2005
Buy Large Abstract Art Prints

Selected articles

Missing the point of the Mac Mini

Beneficial Computer Viruses

The Network Computer: An opportunity for Linux

MiniLesson: An introduction to Linux in ten commands

The Supreme Court is wrong on Copyright Case

Hacker Haiku

No, RMS, Linux is not GNU/Linux

Linux vs. Windows: Why Linux will win

Testing the Digital Ocean $5 Cloud Servers with an MMORPG

Why software sucks

Space Tyrant: A threaded game server project in C

Apple DIY Repair

Space Tyrant: A multiplayer network game for Linux

VPS: Xen vs. OpenVZ

Linux dominates Windows

Scripting: A parallel Linux backup script

The life cycle of a programmer

Programming Language Tradeoffs: 3GL vs 4GL

Closed Source Linux Distribution Launched

How to install Ubuntu Linux on the decTOP SFF computer

Microsoft to push unlicensed users to Linux

Space Tyrant: Multithreading lessons learned on SMP hardware

Graffiti Server Download Page

Mono-culture and the .NETwork effect

Tutorial: Introduction to Linux files

Shadow.sh: A simple directory shadowing script for Linux

Librenix T-Shirts and Coffee Mugs!

The Real Microsoft Monopoly

Space Tyrant: A threaded C game project: First Code

Apple to Intel move no threat to Linux

Why Programmers are not Software Engineers

The short life and hard times of a Linux virus

Download: Linux 3D Client for Starship Traders

 

Firefox sidebar

Site map

Site info

News feed

Features

Login
(to post)

Search

 
Articles are owned by their authors.   © 2000-2012 Ray Yeargin