Posts Tagged ‘ parallel computing ’

Chief Scientist of Nvidia Condemns Moore’s Law, Microprocessors – X-bit labs

Chief Scientist of Nvidia Condemns Moore’s Law, Microprocessors – X-bit labs.

William Dally, chief scientist and senior vice president of research at Nvidia, said in a column that Moore’s Law was no longer enabling scaling of computing performance on microprocessors. In addition, Mr. Dally indicated that central processing units (CPUs) in general could no longer fulfill the demand towards high performance.

“[Moore’s Law] predicted the number of transistors on an integrated circuit would double each year (later revised to doubling every 18 months). This prediction laid the groundwork for another prediction: that doubling the number of transistors would also double the performance of CPUs every 18 months. [Moore] also projected that the amount of energy consumed by each unit of computing would decrease as the number of transistors increased. This enabled computing performance to scale up while the electrical power consumed remained constant. This power scaling, in addition to transistor scaling, is needed to scale CPU performance. But in a development that’s been largely overlooked, this power scaling has ended. And as a result, the CPU scaling predicted by Moore’s Law is now dead. CPU performance no longer doubles every 18 months,” said Bill Dally in a column published at Forbes.

Perhaps, performance of CPUs no longer doubles every year and a half, but, firstly, those chips are universal and very flexible, secondly, they can be manufactured in large volumes. Graphics chips, which, from time to time, outpace the Moore’s Law, quite often cannot be manufactured in large volumes because of poor yields. Moreover, although GPUs can provide higher horsepower than CPUs, they are not that universal and flexible.

Even though historically developers of central processing units were concentrating on increasing clock-speeds of chips, five years ago Advanced Micro Devices and Intel Corp. concentrated on creating more parallel multi-core microprocessors that work on moderate clock-speeds. However, the vice-president of Nvidia also claims that multi-core x86 CPUs will ultimately not solve problem with the lack of necessary computing performance.

“Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often called multi-core, will not work. This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy (consume too much energy per instruction) to fly on parallel programs and to continue historic scaling of performance,” said Mr. Dally.

It is rather logical that Nvidia calls central processing units obsolete since it does not produce them or develop them. The big question is whether AMD and Intel give up and let Nvidia to actually capture part of the market of high-performance computing, where multi-core CPUs rule today.

“Parallel computing, is the only way to maintain the growth in computing performance that has transformed industries, economies, and human welfare throughout the world. The computing industry must seize this opportunity and avoid stagnation, by focusing software development and training on throughput computers – not on multi-core CPUs. Let’s enable the future of computing to fly – not rumble along on trains with wings,” concluded the chief scientist of Nvidia.

Rebel Science News: Why Parallel Programming Is So Hard

Rebel Science News: Why Parallel Programming Is So Hard.

The Parallel Brain

The human brain is a super parallel signal-processing machine and, as such, it is perfectly suited to the concurrent processing of huge numbers of parallel streams of sensory and proprioceptive signals. So why is it that we find parallel programming so hard? I will argue that it is not because the human brain finds it hard to think in parallel, but because what passes for parallel programming is not parallel programming in the first place. Switch to a true parallel programming environment and the problem will disappear.

Fake Parallelism

What is the difference between a sequential program and a parallel program? A sequential program is an algorithm or a list of instructions arranged in a specific order such that predecessors and successors are implicit. Is there such a thing as a parallel algorithm? In my opinion, the term ‘parallel algorithm’ is an oxymoron because an algorithm, at least as originally defined, is a sequence of steps. There is nothing parallel about algorithms whether or not they are running concurrently on a single processor or on multiple processors. A multithreaded application consists of multiple algorithms (threads) running concurrently. Other than the ability to share memory, this form of parallelism is really no different than multiple communicating programs running concurrently on a distributed network. I call it fake parallelism.

True Parallelism

In a truly parallel system, all events are synchronized to a global clock so that they can be unambiguously identified as being either concurrent or sequential. Synchronization is an absolute must in a deterministic parallel system, otherwise events quickly get out step and inferring temporal correlations becomes near impossible. Note that ‘synchronous processing’ is not synonymous with ‘synchronous messaging’. A truly parallel system must use asynchronous messaging; otherwise the timing of events becomes chaotic and unpredictable. The human brain is a temporal signal processing network that needs consistent temporal markers to establish correlations. While single thread programs provide adequate temporal (sequential) cues, concurrent threads are non-deterministic and thus concurrent temporal cues are hard to establish, which leads to confusion. See also Parallel Programming: Why the Future Is Synchronous for more on this subject.

It is beneficial to view a computer program as a communication system in which elementary processes send and receive signals to one another. In this light, immediately after execution, an operation (predecessor) in an algorithm sends a signal to the next operation (successor) in the sequence meaning essentially, ‘I’m done; now it’s your turn’. Whereas in an algorithmic program, every element or operation is assumed to have only one predecessor and one successor, by contrast, in a parallel program, there is no limit to the number of predecessors or successors an element can have. This is the reason that sequential order must be explicitly specified in a parallel program. Conversely, concurrency is implicit, i.e., no special construct is needed to specify that two or more elements are to be executed simultaneously.

Composition vs. Decomposition

The common wisdom in the industry is that the best way to write a parallel program is to break an existing sequential program down into multiple threads that can be assigned to separate cores in a multicore processor. Decomposition, it seems, is what the experts are recommending as the correct method of parallelization. However, this begs a couple of questions. If composition is the proper method of constructing sequential programs, why should parallel programs be any different? In other words, if we use sequential elements or components to build a sequential program, why should we not use parallel elements or components to build parallel programs? If the compositional approach to software construction is known to work in sequential programs, it follows that the same approach should be used in parallel software construction. It turns out that signal-based parallel software lends itself well to the use of plug-compatible components that can snap together automatically. Composition is natural and easy. Decomposition is unnatural and hard.

Conclusion

In conclusion, the reason that parallel programming is hard is that it is not what it is claimed to be. As soon as parallel applications become implicitly parallel, synchronous and compositional in nature, parallel programming will be at least an order of magnitude easier than sequential programming. Debugging is a breeze in a deterministic environment, cutting development time considerably.

Related articles: