Building reliable systems-on-chip in nanoscale technologies

Modern application-specific integrated circuits (ASICs) contain complete systems on a single die, composed of many processing elements that communicate over a dedicated router-based on-chip network. As systems-on-chip comprise billions of transistors with feature sizes in the range of 10 nm, reliable operation cannot be established without carefully engineered support at all levels, from technology to the circuit- and the system-layer. This article surveys contributions of research groups at TU Wien to this field. At lower levels of abstraction, they range from the generation of fault models for simulation that closely match reality and are at the same time efficient to use, to circuit-level radiation-tolerance techniques. At the level of on-chip networks, novel fault-tolerant routing algorithms being developed together with architectural techniques to isolate faulty parts while keeping the healthy parts connected and active. The article will briefly portray the associated research activities and summarize their most relevant results.