DARTS - Distributed Algorithms for Robust Tick-Synchronization
The FIT-IT project DARTS — Distributed Algorithms for Robust Tick Synchronization is dedicated to the development of a novel method for providing synchronous systems with a robust and fault-tolerant clocking methodology to overcome the problems and limitations of currently used approaches.
Abstract
The future trends in digital circuit design are characterized by ever increasing integration densities, complexity and clock rates. As a consequence, it is getting more and more difficult to supply all function units (Fu) within a chip with a common clock signal, as required in the synchronous design paradigm:
• Signal propagation takes a full clock period (or even more) until all Fu’s are reached.
• Increasing efforts are needed for skew reduction in the clock tree.
• Clock-tree & buffers dissipate much power and consume considerable chip area. In addition they are responsible for serious problems with supply voltage and electromagnetic radiation.
• The maximum clock rate can only be determined very late in the design flow. Furthermore, variations within operating conditions and process parameters yield to the need for significant timing margins (up to 50%).
Additionally, the clock signal represents a critical single point of failure, with fatal consequences:
• Every single fault in the clock-tree (transient or permanent) leads to a system failure.
• Dramatically increasing error rates are expected in future technologies, which will be of concern also for non-safety critical applications.
• Crystal oscillators are expensive, big and sensitive to mechanical damage.
GALS (globally asynchronous locally synchronous) constitutes a potential solution for some of the above problems. It relies on local synchronous islands that communicate with each other asynchronously. However, the interfacing between the uncorrelated local clock domains is a major difficulty with this approach. Moreover, GALS architectures do not handle the increasingly important fault-tolerance and robustness issues. Existing methods for fault-tolerant clock generation, on the other hand, are mostly based on the unrealistic assumption of negligible signal delays and incorporate costly circuits like PLLs, voters and redundant clock sources.
Considering that wire delays increasingly dominate gate delays in modern chips, we propose an alternative method for hardware clock generation that has its origin in fault-tolerant clock synchronization: As in distributed systems, a simple fault-tolerant distributed algorithm (Tick Synchronization / Tick Generation Algorithm, TG-Alg) is employed for synchronizing — in fact, generating — the modules' local clock signals (micro-ticks). To synchronize the local micro-ticks, all (resp. a sufficient number of) TG-Algs are connected to each other by a simple network (namely, a bus) called TG-Net. Note that TG-Algs and TG-Net inevitably have to be implemented in asynchronous logic here.
The synchronization precision Pi = Pi(Theta) of the proposed method only depends on the ratio T between the slowest and the fastest path of the TS-Net (quasi-delay-insensitive). A programmable divider ÷Pi() can be added to transform the micro-ticks into macro-ticks, which represent the desired synchronous global clock. Advantages of the method are:
• It tolerates faults in the TS-Net (both transient and permanent), whose maximum number can be dimensioned according to reliability requirements.
• The synchronous design paradigm can not only be applied for the design of the local Fu’s (via micro-ticks), but also for global communication between distant Fu’s (based on macro-ticks).
• The chip is self-clocking and therefore needs no external oscillator. To further improve the frequency stability, temperature compensating circuits or other means can be added.
• The chip automatically runs at the maximum possible clock rate, depending on the current operating conditions. Unlike in the synchronous model, the timing behaviour does not depend on worst case assumptions and is hence very robust.
• Elaborate treatment (routing, de-skewing) of the clock-tree is not required anymore, because the routing of the TS-Net is relatively uncritical.
• The number of TS-Alg units can be chosen in accordance with the given requirements, to optimize the resulting overhead (area, power consumption, etc.) and properties (faults to be tolerated).
• EM-radiation is not concentrated at a single frequency, but somewhat distributed over the spectrum. Furthermore, there is no synchronous switching of the whole chip, which eliminates problems with the power supply (IR-drop, etc.).
In space-borne applications digital circuits are exposed to high radiation doses and hence require a high level of fault tolerance. The new opportunities provided by the DARTS approach in this context are likely to provide a competitive advantage to the industrial project partner Austrian Aerospace.
Project aims
The main objective of the project is the proof of concept of the proposed method by means of an ASIC prototype implementation. A suitable example application shall be identified to evaluate the properties of this new method.
In particular we want to prove that the proposed method has notably better robustness and fault-tolerance properties than commonly used approaches that rely on external clocking.
The fundamental work packages of the project are:
• adaption (and proof) of an existing clock synchronization algorithm such that it can be a efficiently implemented in HW,
• implementation of this new algorithm in asynchronous logic,
• fabrication of an ASIC based on the developed concept (including all involved issues such as design verification and test),
• quantitative evaluation of the most important properties of the solution based on a multitude of experiments and measurements performed with the target application.
Partners
Vienna University of Technology
Institute of Computer Engineering
Embedded Computing Systems Group, E182-2
http://www.ecs.tuwien.ac.at
Univ.Prof. Dipl.-Ing. Dr.techn. Ulrich Schmid - Head of Group, Algorithms & Concepts
Ao.Univ.Prof. Dipl.-Ing. Dr.techn. Andreas Steininger - Project Head, Concepts
Dipl.-Ing. Dr.techn. Gottfried Fuchs - Project Management, Algorithm Design & Impl.
Dipl.-Ing. Thomas Handl - Component Design, Testing of Async. Circuits
Dipl.-Ing. Dr.techn. Josef Widder - Distributed Algorithms
Dipl.-Ing. Dr.techn. Matthias Függer - Distributed Algorithms
Dipl.-Ing. Markus Ferringer - ASIC Pipeline Implementations & Optimizations
RUAG Aerospace Austria GmbH
http://www.space.at
Dipl.-Ing. Dr.techn. Manfred Sust - Local Project Head
Ing. Gerald Kempf - Requirements Spec., Interface to Chip Fab., Synthesis
Dipl.-Ing. Christian Gleiss - Verification & Validation
Dipl.-Ing. Georg Grabmayr - Dissemination
Dipl.-Ing. Franz Zangerl - System Engineering
Roman Zangl BSc. - Verification & Validation
Dissemination
Seminars and Workshops
A joined proposal by Bernadette Charron-Bost (Ecole Polytechnique - Palaiseau, FR), Shlomi Dolev (Ben Gurion University, IL), Jo Ebergen (Sun Microsystems - Menlo Park, US) and Ulrich Schmid (TU Wien, AT) for a Dagstuhl Seminar on Fault-Tolerant Distributed Algorithms on VLSI Chips had been accepted and the seminar took place from 7th to 10th of September 2008 at the Schloss-Dagstuhl, Leibnitz-Zentrum für Informatik.
The focus of the seminar was to explore whether the wealth of existing fault-tolerant distributed algorithms research can be utilized for meeting the challenges of future generation VLSI chips. Participants from both the distributed fault-tolerant algorithms community and digital design community surveyed the current state-of-the-art and tried to identify possibilites to work together.
Workshop and Conference Publications
[1] Fault-Tolerant Algorithms on SoCs - A case study
Andreas Steininger, Matthias Fuegger, Ulrich Schmid, Gottfried Fuchs
The International Conference on Dependable Systems and Networks (DSN) 2006, Fast Abstract, June 2006.
[2] The DARTS project
Andreas Steininger
Presentation of the DARTS project at the FFG ESA-Workshop, Vienna, Austria, May 2006.
[3] Fault-Tolerant Distributed Clock Generation in VLSI Systems-on-Chip
Matthias Fuegger, Ulrich Schmid, Gottfried Fuchs, Gerald Kempf
The Sixth European Dependable Computing Conference (EDCC-6), October 2006.
[4] Testing the Hardware Implementation of a Distributed Clock Generation Algorithm for SoCs
Thomas Handl, Andreas Steininger, Gottfried Fuchs, Franz Zangerl
IEEE East-West Design & Test International Workshop (EWDTW'06), September 2006.
[5] VLSI Implementation of a Fault-Tolerant Distributed Clock Generation
Markus Ferringer, Gottfried Fuchs, Andreas Steininger, Gerald Kempf
IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2006), October 2006.
[6] Threshold Modules -- Die Schlüsselelemente zur Verteilten Generierung eines Fehlertoleranten Taktes
Gottfried Fuchs, Julian Grahsl, Ulrich Schmid, Andreas Steininger, Gerald Kempf
The Austrian National Conference on the Design of Integrated Circuits and Systems (Austrochip 2006), October 2006.
[7] An Efficient Test for a Transition Signalling based Up-/Down-Counter
Matthias Fuegger, Thomas Handl, Andreas Steininger, Josef Widder, Christian Tögel
The Austrian National Conference on the Design of Integrated Circuits and Systems (Austrochip 2006), October 2006.
Poster presentation at the Austrochip 2006
[8] Analysis of Constraints in a Fault-Tolerant Distributed Clock Generation Scheme
Gottfried Fuchs, Matthias Függer, Andreas Steininger, Franz Zangerl
3rd International Workshop on Dependable Embedded Systems (WDES'06), October 2006.
[9] Towards a Systematic Design of Fault-Tolerant Asynchronous Circuits
Ulrich Schmid, Andreas Steininger, Helmut Veith
1. GMM/GI/ITG-Fachtagung Zuverlässigkeit und Entwurf (ZuD 2007), München, Germany, March 2007.
Poster presentation at the ZuD 2006
[10] An Efficient Test Strategy for a Fault-Tolerant Clock Generator for Systems-on-Chip
Thomas Handl, Andreas Steininger
19. Workshop Testmethoden und Zuverlässigkeit von Schaltungen und Systemen (TuZ07), Erlangen, Germany, March 2007.
[11] SAFE — A Scalable Environment for Automated Transistor Level Fault Effect Analysis
Julian Grahsl, Thomas Handl, Andreas Steininger, Gerald Kempf
15th Austrian Workshop on Microelectronics (Austrochip 2007), October 2007.
[12] Adopting the Scan Approach for a Fault Tolerant Asynchronous Clock Generation Circuit
Thomas Handl, Andreas Steininger, Gerald Kempf
2nd IEEE International Design and Test Workshop (IDT 2007), December 2007.
[13] Exploring the Usefulness of the Gate-level Stuck-at Fault Model for Muller C-Elements
Julian Grahsl, Thomas Handl, Andreas Steininger
20. Workshop für Testmethoden und Zuverlässigkeit von Schaltungen und Systemen (TuZ 2008), Wien, February 2008.
[14] Mapping a Fault-Tolerant Distributed Algorithm to Systems on Chip
Gottfried Fuchs, Matthias Függer, Ulrich Schmid, Andreas Steininger
11th EUROMICRO Conference on Digital System Design (DSD 2008), Parma, Italy, September 2008.
[15] On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme
Gottfried Fuchs, Matthias Függer, Andreas Steininger
15th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC 2009), Chapel Hill, North Carolina, USA, May 2009.
[16] Implications of VLSI Fault Models and Distributed Systems Failure Models -- A hardware designer's view
Gottfried Fuchs
Dagstuhl Seminar 08371 on Fault-Tolerant Distributed Algorithms on VLSI Chips, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany
[17] Error Containment in the Presence of Metastability
Andreas Steininger
Dagstuhl Seminar 08371 on Fault-Tolerant Distributed Algorithms on VLSI
Chips, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany
[18] How to Speed-up Fault-tolerant Clock Generation in VLSI Systems-on-Chip via Pipelining
Andreas Dielacher, Matthias Függer, Ulrich Schmid
Brief Announcement at the Twenty-Eighth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed
Computing, Calgary, Canada, August 2009
[19] On the Stability and Robustness of Non-Synchronous Circuits with Timing Loops
Matthias Függer, Gottfried Fuchs, Ulrich Schmid, Andreas Steininger
3rd Workshop on Dependable and Secure Nanocomputing, In conjunction with the 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'09), Lisabon, Portugal, June 2009
[20] ARROW - A Generic Hardware Fault Injection Tool for NoCs
Michael Birner and Thomas Handl
Euromicro Conference Digital System Design (DSD 2009)
Patras, Greece, August 2009
Journal Publications
[1] FIT-IT-Projekt DARTS: Dezentrale fehlertolerante Taktgenerierung
Ulrich Schmid, Andreas Steininger, Manfred Sust
Elektrotechnik & Informationstechnik (e&i) 1-2/07, 2007.
Matthias Függer, Ulrich Schmid
Distributed Computing, 24(6):323-355, 2012.
Gottfried Fuchs, Andreas Steininger
Journal of Electrical and Computer Engineering, vol. 2011, Article ID 936712, doi:10.1155/2011/936712, 2011.
Press Releases
[1] Gefühl für Takt und Sprache
Der Standard, 2004-12-06
[2] Institut für Technische Informatik "fit" bei Embedded Systems
TU-Wien Aktuelles, 2004-12-07APA Journal#50, Dezember 2004
[4] Mit Taktgefühl in den Weltraum
TU-Wien Presseaussendung 40/2007, 2007-07-02
[5] Mehr Taktgefühl im Weltraum
Der Standard, Forschung Spezial, 2007-12-11
[6] Taktgefühle
SWR2 Feature am Sonntag, Autorin: Marion Ammicht, Redaktion: Walter Filz, 2008-12-21
Ao.Univ.Prof. Andreas Steininger im Interview mit Marion Ammicht
Students' work
[1] Fault-Tolerant Distributed Clock Generation in VLSI Systems on Chip
Matthias Függer,
Master's Thesis, Vienna University of Technology,
Institute of Computer Engineering.
[2] An Asynchronous Hardware Design for Distributed Tick Generation
Markus Ferringer,
Master's Thesis, Vienna University of Technology,
Institute of Computer Engineering.
[3] Threshold Logic Implementations
Konrad Mönks
Bachelor's Thesis, Vienna University of Technology, Institute of Computer Engineering, Vienna, Austria, August 2006.
[4] Design Approaches for Radiation Hardening in Digital Circuits
Oliver Höftberger
Bachelor's Thesis, Vienna University of Technology, Institute of Computer Engineering, Vienna, Austria, January 2007.
[5] Threshold-Gates -- Konfigurierbare m-aus-n Gatter zur Implementierung fehlertoleranter asynchroner Schaltungen
Julian Grahsl
Bachelor's Thesis, Vienna University of Technology, Institute of Computer Engineering, Vienna, Austria, 2007.
[6] Programmable Delay Lines
Michael Wessner
Bachelor's Thesis, Vienna University of Technology, Institute of Computer Engineering, Vienna, Austria, 2008.
Gottfried Fuchs,
PhD Thesis, Vienna University of Technology,
Institute of Computer Engineering.
Links
More information on FIT-IT www.fit-it.at
Fundamental Work www.ecs.tuwien.ac.at/projects/Theta/Theta_poster.pdf
Conference Calendar www.ecs.tuwien.ac.at/projects/DARTS/confcal/
Acknowledgements
This work is supported by FIT-IT in the track Embedded Systems under project number 809456.
FIT-IT is a program of the bmvit (Austrian Federal Ministry of Transport, Innovation and Technology) in cooperation with FFG (Austrian Research Promotion Agency Ltd.) managed by eutema (eutema Technology Management).