David Overton's Blog and Discussion Site
This site is my way to share my views and general business and IT information with you about Microsoft, IT solutions for ISVs, technologists and businesses, large and small. I specialise in Windows Intune and SBS 2008.
This blog is purely the personal opinions of David Overton. If you can't find the information you were looking for e-mail me at admin@davidoverton.com.

To find out more about my Windows Intune BOOK - Microsoft Windows Intune 2.0: Quickstart Administration click here

To find out more about my SBS 2008 BOOK - Small Business Server 2008, Installation, Migration and Configuration click here

News this week - multi-core chips need some software help and some software re-architecting to make them most effective at delivering extra performance .. and of course, there is more multi-core to come!
David Overton's Blog

Buy my books

Windows Intune:Quickstart Administration


This is the RAW book (Read as Written).
Click here for more information
Buy or pre-order today

SBS 2008 - Installation, Migration and Configuration

Small Business Server 2008 – Installation, Migration, and Configuration

Buy today in book or e-book form

Request a Review Copy

Twitter

Syndication

I've read the two articles - Computer-Chip Makers Pick Up Pace in Multicore Race - WSJ.com and BetaNews | AMD: Will More CPU Cores Always Mean Better Performance? and realised they were both harping on about the same thing - the need for better use of mutli-core / multi-processor technology.  In a former life I helped people make their software take advantage of many cpus and the scheduling of tasks, sharing of information and understand how many processes can put load onto the memory, disk and internal buses is a huge area of learning.  Whether it is a 8-core or a SMP 8-processor box, the problems are the same.  If 8 cpus all demand a disk read from different regions of the disk, then you have 8 disk seeks, which could mean 1-8 x 8ms wait for the processes.  In computer terms, waiting for 64ms for a disk read (as the 8th cpu would do) is almost an eternity for a cpu.  What is more, if the disk regions are on 8 separate disks to stop I/O bottlenecks then the i/o bus and front-side-bus might not be able to take that amount of data without some scheduling difficulties.

Anyway, it is interesting to hear how task scheduling, memory protection and run-time engines changes make me think that these are really just little scratches on the surface of a deep problem.  As anyone will tell you who has got their product running in a 64-node cluster - it takes software re-design and more than just creating a bunch of threads or lock management.  Anyway, the articles are worth reading to see where it is going.

I have snipped the articles, so click the links above to read the full articles.

Computer-Chip Makers - Pick Up Pace in Multicore Race

A race to cram more electronic brains on computer chips is accelerating, and prompting new moves to address the difficulty of programming such complex creations.

<snipped>

"We've really eliminated every reason not to go to quad-core," said Kirk Skaugen, a vice president in Intel's digital enterprise group.

{ed - really, what about helping with mutli-processor / multi-core systems actually making best use of all processors}

Easing Bottlenecks

That chip, code-named Rock, may attract more attention for another feature -- built-in circuitry for minimizing a bottleneck that has prevented software from fully exploiting multicore chips.

"So far the software revolution has not happened," said Marc Tremblay, a Sun senior vice president. If Rock can ease the programming problem, "we think it's a pretty big deal."

For years, chip makers boosted performance by increasing the internal timing pulses on chips, a measure called clock speed. Without any effort, programmers found their creations kept working faster.

Those easy gains have all but ended, because high clock speeds consume too much electrical power. So chip makers began boosting the capacity of their chips by adding more processors.

Exploiting that capability relies, in part, on breaking programs apart into sequences of instructions called threads that can be executed simultaneously.

Not all software uses threads effectively. Even those that do can face a problem, because of safeguards built into chips to prevent conflicts as two threads seek to draw the same piece of data from memory -- such as spouses simultaneously trying to access the same bank account, Mr. Tremblay said. Those safeguards, which work like locks, mean that other threads often wait around before they can do any useful work.

Hardware Changes

Sun predicts Rock will be the first chip that builds in a widely discussed technology called transactional memory. It doesn't prevent contention over data; it detects conflicts that occur and repeats any operations affected by them, said Mark Moir, a Sun senior staff engineer. Software prepared to exploit the technology should see big performance benefits, he said.

Other companies are considering hardware changes to help programmers exploit multicore chips. Intel, for example, is considering adding built-in circuitry that takes over a function called a task scheduler -- ordinarily a piece of software that maps out which processor core should execute specific threads, said Sean Koehl, an Intel technology strategist.

AMD, for its part, is expected to propose expanding the basic set of instructions for the widely used x86 chip design, starting with a function called a profiler that monitors operations on a chip and can allow software to make speedy decisions about the best ways to carry them out.

AMD: Will More CPU Cores Always Mean Better Performance?

The company that helped inaugurate the multicore era of CPUs has begun studying the question, will more cores always yield better processing? Or is there a point where the law of diminishing returns takes over? A new tool for developers to take advantage of available resources could help find the answers, and perhaps make 16 cores truly feel more powerful than eight cores.

Two years ago, at the onset of the multicore era, testers examining how simple tasks took advantage of the first CPUs with two on-board logic cores discovered less of a performance boost than they might have expected. For the earliest tests, some were shocked to discover a few tasks actually slowed down under a dual-core scheme.

While some were quick to blame CPU architects, it turned out the problem was the way software was designed: If a task can't be broken down, two or four or 64 cores won't be able to make sense of it, and you'll only see performance benefits if you try to do other things at the same time.

So when AMD a few months back debuted the marketing term "mega-tasking," defining it to refer to the performance benefits you can only really see when you're doing a lot of tasks at once, some of us got skeptical. Maybe the focus of architectural development would be diverted for awhile to stacking tasks atop one another, rather than streamlining the scheme by which processes are broken down and executed within a logic core.

Today, AMD gave us some substantive reassurance with the announcement of what's being called lightweight profiling (LWP). The idea is to give programmers new tools with which to aid a CPU (specifically, AMD's own) in how best their programs can utilize their growing stash of resources. In a typical x86 environment, CPUs often have to make their own "best guesses" about how tasks can be split up among multiple cores.

<snip>

As Stahl explained to us, software that truly is designed to take advantage of multiple cores will set up resources intentionally for that purpose: for example, shared memory pools, which a single-threaded process probably wouldn't need. But how much shared memory should be established? If this were an explicit multithreading environment like Intel's Itanium, developers would be making educated guesses such as this one in advance, on behalf of the CPU.

So LWP tries to enable the best of both worlds, implicit and explicit parallelism: It sets up the parameters for developers to create profiles for their software. AMD CPUs can then use those profiles on the fly to best determine, based on a CPU's current capabilities and workload, how threads may be scheduled, memory may be pooled, and cache memory may be allocated.

<snip>

AMD believes about 80% of the potential usefulness of LWP will be realized by just two software components: Sun's Java Virtual Machine, and Microsoft's .NET Runtime module. While operating system drivers will not be necessary for operating systems to take advantage of LWP, it's AMD's hope that developers who are using high-level, just-in-time-compiled languages anyway will be able to automatically benefit from LWP, at least for the most part.

 

ttfn

David


Posted Tue, Aug 28 2007 12:36 AM by David Overton

Comments

University Update-Microsoft .Net-News this week - multi-core chips need some software help and some software re-architecting to make them most effective at delivering extra performance .. and of course, there is more multi-core to come! wrote University Update-Microsoft .Net-News this week - multi-core chips need some software help and some software re-architecting to make them most effective at delivering extra performance .. and of course, there is more multi-core to come!
on Tue, Aug 28 2007 6:57 AM

Pingback from  University Update-Microsoft .Net-News this week - multi-core chips need some software help and some software re-architecting to make them most effective at delivering extra performance .. and of course, there is more multi-core to come!

Add a Comment

(optional)  
(optional)
(required)  
Remember Me?

(c)David Overton 2006-13