Diploma and Master Theses (authored and supervised):

T. Nowak:
"Topology in Distributed Computing";
Supervisor: U. Schmid; Institut für Technische Informatik - 182/2, 2010.

English abstract:
Due to rising performance demands, both the feature size and the clock cycle times of VLSI circuits have decreased dramatically. Whereas this allows hardware designers to build highly complex systems on a chip, this also has negative side effects: (i) The small feature sizes made transistors more susceptible to SEUs and SETs (single event upsets and transients) caused by particle hits, and (ii) the increased complexity makes synchronous clocking of a chip by a single oscillator either impossible or very costly. A promising mitigation technique for both (i) and (ii) is the GALS (globally asynchronous locally synchronous) approach. The idea is to partition the chip into loosely coupled functional units, which are clocked by local oscillators each. However, the costs of GALS is high: It abolishes synchronous clocking and hence the global chip time base. In the context of the DARTS project, an alternative distributed clocking scheme has been developed, which is fault-tolerant and still maintains reasonable synchrony between functional units: Similar as in the GALS approach, each functional unit has its own local clock. However, in DARTS, the local clocks are generated by small tick generation modules that implement a Byzantine fault-tolerant distributed algorithm. Rather than by means of oscillators, approximately synchronized local clock signals are generated by the interaction of the tick generation modules. In the course of the project, the correctness of DARTS has been formally proved, and both an FPGA and an ASIC prototype implementation have been built. A major drawback of the original DARTS, however, is that the attainable local clock frequency is determined by the global interconnect delays. Since these delays depend on the chip geometry, they cannot be made arbitrary small. In this thesis, a pipelined version of the DARTS algorithm is presented, namely pDARTS, which allows to increase the clock frequency up to the speed of local functional blocks, thus completely hiding global interconnect delays. A formal-mathematical analysis is carried out, which proves that the algorithm is correct and maintains good synchronization properties also in the worst-case. An FPGA prototype implementation demonstrates the feasibility and efficiency of the pipelining approach in the context of DARTS.

Electronic version of the publication:

Created from the Publication Database of the Vienna University of Technology.