This paper presents a design approach for improving energy-efficiency and throughput of parallel architectures in near- and sub-threshold voltage circuits. The focus is to suppress leakage energy dissipation of the idle portions of circuits during active modes, which can allow us to wholly transform the throughput improvement from parallel architectures into energy savings via deep voltage scaling. We begin by investigating the efficacy of parallel and pipeline architectures in the near- and sub-threshold circuits. The investigation reveals that active energy dissipation largely undermines the ability of deep voltage scaling to transform excessive throughput into energy savings. Techniques, such as power-gating switches (PGSs), can mitigate active-leakage power dissipation; however, the over head for entering and exiting sleep modes can offset the energy savings provided by sleep mode, particularly if sleep time is fine grained for suppressing active leakage. Therefore, in this paper, we propose a PGS design technique, inspired by the so-called zigzag super cutoff CMOS, in order to optimize the overheads of mode transitions of PGS in near- and sub-threshold circuits. The proposed technique enables to have circuits in sleep mode for as short as a single clock cycle with a negligible amount of energy and delay overheads. We apply our proposed design to parallel multiplier-based test circuits operating at near- and sub-threshold voltages. Simulations show a significant improvement in energy efficiency over baselines at the same throughput.