*Conclusion:* A new flexible column redundancy scheme has been proposed. The scheme has a large column RAU without any speed penalty and can also lead to a reduction in layout area by reducing the number of fuse-sets. The proposed highly flexible redundancy scheme is suitable for use in high density DRAMs with multi-I/Os and embedded DRAMs.

© IEE 2000

21 February 2000

*Electronics Letters Online No: 20000712 DOI: 10.1049/el:20000712* 

Yong-Weon Jeon and Suki Kim (School of Electrical and Electronic Engineering, Korea University, Seoul, 136-701, Korea)

Young-Hyun Jun (School of Electrical and Computer Engineering, SungKyunKwan University, Suwon, Kyunggi-Do, 440-746, Korea)

### References

- 1 KIM, C.: 'A 2.5V, 72-Mbit, 2.0-Gbyte/s packet-based DRAM with a 1.0-Gbps/pin interface', *IEEE J. Solid-State Circuits*, 1999, 34, (5), pp. 645–652
- 2 NAMEKAWA,: 'Dynamically shift-switched dataline redundancy suitable for DRAM macro with wide data bus'. Symp. VLSI Circuits, June 1999, pp. 149–152
- 3 TAKASE, S.: 'A 1.6-Gbyte/s DRAM with flexible mapping redundancy technique and additional refresh scheme', *IEEE J. Solid-State Circuits*, 1999, **34**, (11), pp. 1600–1606

# On-chip timing reference for self-timed microprocessor

S. Temple and S.B. Furber

A calibratable on-chip timing reference circuit has been developed to enable a self-timed microprocessor to interface to standard offchip memory and peripheral devices. The circuit exhibits several of the desirable properties of self-timed circuitry such as low power consumption and low electromagnetic interference (EMI). In addition, it is highly testable.

Background: The AMULET processors are a series of self-timed (or asynchronous) 32 bit microprocessors which are instruction set compatible with the industry standard ARM processors. The first device, AMULET1 [1], is a processor core designed using a twophase, bundled data methodology [2]. Its interface to the outside world is also two-phase and this presents considerable difficulty when it has to be interfaced to a memory system based on industry standard parts. AMULET2e is the second chip in the series which retains the bundled data methodology but uses four-phase signalling [3]. This device contains a processor core derived from the AMULET1 design as well as an on-chip cache. It also contains circuitry designed to simplify the interface to external devices and, in small systems, no additional logic is necessary. However, it relies on an external timing reference (a delay line) to calibrate the circuitry which generates memory control signals.

While an external timing reference provides great flexibility, in practice it has proved problematic as the solutions that were found, such as inverter chains and integrated delay lines, were difficult to control to a fine granularity and were prone to significant variation as the operating conditions varied. Synchronous systems typically rely on a crystal oscillator to provide a timing reference but while a crystal circuit could be adapted for use in this application it has a number of disadvantages. A significant issue with the AMULET designs is the reduction of electromagnetic interference (EMI) and having a high-speed crystal running continuously is not desirable in this respect. It would also negate the power advantage of the self-timed solution which only consumes power when off-chip accesses occur.

The most recent design, AMULET3i [4], incorporates the timing reference onto the chip. This has the advantage that no pins are required compared to the two used in AMULET2e and not having the timing signal go off-chip also reduces power and EMI. In addition, the resulting circuit can be made much more controllable. Memory interface mechanism: Both AMULET2e and AMULET3i use a similar system for interfacing to external memory devices. They communicate using standard address and data buses and generate a number of control signals for driving commodity memory parts such as RAM (static and dynamic), ROM and peripheral chips such as UARTs. The timing of these signals is controlled at two levels. The lower level provides a reference delay in the form of a delay line. This reference delay is then 'called' a number of times by the upper level to generate timing for specific types of memory device. The memory map of the processor is divided into 'regions' and the upper level timing behaviour of each region can be specified individually. Thus each region will contain memory devices with similar timing requirements with all timing being specified as multiples of the basic timing reference.

An on-chip timing reference circuit therefore has the following requirements: it must be configurable to cope with varying timing caused by manufacturing process variations; it is desirable that the configuration can be carried out dynamically by software as this will allow the system to cope with variations in voltage and temperature which will affect the behaviour of the delay. Furthermore, to achieve maximum performance from the memory system it is advantageous to be able to calibrate the delay against a known reference. Finally, as AMULET3i was designed for use in a commercial product, the testability of the timing system should be good.

Timing reference design: The timing reference is based around a delay line built of many small fixed delays (Fig. 1). The overall delay can be altered by switching these small delay elements into or out of the circuit. Each delay element has a delay of ~250ps (typically silicon) and AMULET3i uses 54 of these elements to make up the delay line. Each delay element requires a single control bit to cause it to extend the delay line or to loop back to curtail the delay. An elegant control mechanism was devised, using a chain of Muller C-gates, which has a number of useful properties. First, it can be controlled easily with only three control bits. Secondly, as well as providing incremental control, two states of the control bits can set the delay line directly to minimum or maximum delay. This means that the delay can be readily forced to its maximum value at power-on reset. Thirdly, because of the way that the circuit operates, the outputs of all of the C-gates can be effectively tested by observing only the outputs of the two gates at the ends of the delay line. This is because the state of any C-gate can only be altered if both of its neighbours operate correctly.

The Muller C-gate used in this design is a latching element with the following behaviour. If the central input and the '+' input are both high, the output goes high. If the central input and the '-' input are both low, the output goes low. Otherwise the output remains in its current state.



Fig. 1 Delay line with C-gates

*Controlling the delay line:* Three control signals (L, M, R) are required to control the delay line. If they are low (the reset state), all C-gate outputs will be low and the delay line will exhibit its maximum delay. If they are high the outputs of the C-gates also go high and the delay line exhibits its minimum delay.

To adjust the length of the delay line a sequence of bit patterns must be applied to L, M and R. The required effect is to maintain the outputs of the C-gates at the left hand end of the delay line as zero and the outputs of C-gates at the right hand end as one. The delay is then adjusted one step at a time by moving the boundary between zero and one outputs to the left to reduce the delay or to the right to increase it. For example, following a reset it will generally be necessary to reduce the delay from its maximum value. The following sequence will achieve this:

| L | М | $\mathbf{R}$ | Action |
|---|---|--------------|--------|
|---|---|--------------|--------|

- $0 \quad 0 \quad 0$  Reset to maximum delay
- 1 0 1 Reduce delay by one stage (set bit 54)
- 0 0 1 Idle
- $0 \quad 1 \quad 1 \qquad \text{Reduce (set bit 53)}$
- 0 1 0 Idle
- $1 \quad 1 \quad 0 \qquad \text{Reduce (set bit 52)}$
- 1 0 0 Idle
- $1 \quad 0 \quad 1 \qquad \text{Reduce (set bit 51)}$

Implementation details: The delay line as described provides delays from approximately 0.25 to 13.5 ns. In AMULET3i a rather wider range is required to accommodate slow memory devices and so a prescaler circuit is also present which allows the basic delay to be multiplied by 1, 2, 4 or 8. Some overheads are associated with the prescaler, and the maximum delay including overheads is ~120 ns on typical silicon. The implementation technology is a 0.35 µm, three-layer metal CMOS process with a worst case process variation of ~30%. The wide range of delays available from the implementation is adequate to support the commodity memory and peripheral devices which will be used with AMULET3i.

*Calibration:* To calibrate the delay line two facilities are required. One is a real-time clock to act as a reference and the other is a means of comparing the action of the delay line with the reference. In AMULET3i a real-time clock is part of the on-chip system peripherals and to compare this with the delay line a 16 bit ripple counter is provided which is clocked by pulses passing through the delay. The processor can enable and disable the counter and also read its value thus enabling the number of delays in a fixed time period to be measured.

In normal operation the timing pulses will be irregularly spaced, reflecting varying access patterns to off-chip memory. To enable calibration a mode is provided which ensures that the delay line is constantly passing pulses. In this mode, whenever the delay line becomes idle a dummy pulse is passed through it so that a constant stream of pulses is generated. This has the (benign) side effect that requests to use the delay line from the memory interface logic will be held up on occasion for no more than one pass through the delay. Calibration may take place once, at system start-up, or it may occur more often if it is anticipated that the system's operating parameters may vary significantly during operation; this is entirely under software control.

*Conclusion:* A flexible on-chip delay circuit has been described which is used to control access timing for off-chip memory devices. The circuit can provide a wide range of delays and has a facility for calibration in conjunction with a timing reference. A novel control circuit for the delay line has been devised which is simple to control and easy to test.

Acknowledgment: This work was supported by the EU-funded OMI-ATOM project and the authors are grateful to the European Commission for their continuing support.

#### © IEE 2000 Electronics Letters Online No: 20000724 DOI: 10.1049/el:20000724

28 March 2000

S. Temple and S.B. Furber (Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, United Kingdom)

## References

- FURBER, S.B., DAY, P., GARSIDE, J.D., PAVER, N.C., and WOODS, J.V.: 'The design and evaluation of an asynchronous microprocessor'. Proc. ICCD'94, October 1994, pp. 217–220
- 2 SUTHERLAND, I.E.: 'Micropipelines', Commun. ACM, 1989, 32, (6), pp. 720-738

ELECTRONICS LETTERS 25th May 2000 Vol. 36 No. 11

- FURBER, S.B., GARSIDE, J.D., RIOCREUX, P., TEMPLE, S., DAY, P., LIU, J., and PAVER, N.C.: 'AMULET2e: An asynchronous embedded controller', *Proc. IEEE*, 1999, 87, (2), pp. 243–256
- 4 GARSIDE, J.D., FURBER, S.B., and CHUNG, S.-H.: 'AMULET3 revealed'. Proc. Async '99, April 1999, pp. 51-59

# Coreless printed circuit board (PCB) transformers with high power density and high efficiency

S.C. Tang, S.Y.R. Hui and H. Chung

The authors report the use of a coreless printed circuit board transformer for power conversion with very high power density and efficiency. A coreless PCB transformer with an outermost radius of ~1cm and 19 turns for both the primary and secondary windings can transfer 19W at an efficiency of 90%, resulting in a record power density of 24W/cm2. The power density and energy efficiency of a coreless PCB transformer are higher than those of core-based microtransformers. Coreless transformers are simpler in structure, easier to implement in silicon wafer and cheaper than core-based planar transformers.

Introduction: As the sizes of electronic circuits are decreasing, planar inductor and transformers are becoming more important because of their size advantage. Active research has been carried out in recent years into microtransformers [1, 2] and planar transformers [3]. So far, most efforts have been devoted to transformer designs with magnetic cores. It is well known that as the operating frequency increases, the required size of the magnetic core will decrease. This has led to the design of very small magnetic structures [1, 2]. However, little research has been carried out into addressing the issue of when the size of the magnetic material will approach zero and become zero, although the operating frequencies in many applications have well exceeded 1MHz. Recently, we demonstrated that it is practically feasible and economically sound to use coreless PCB transformers [3 - 6]. Several misunderstandings about the use of planar transformers without magnetic cores have been clarified. Coreless PCB transformers with a few turns in the primary and secondary windings do not behave like short circuits when operated at the appropriate frequency [3 - 6]. The voltage gain can be higher than unity using a resonant technique. The use of coreless transformers does not lead to electromagnetic interference (EMI) problems in electronic circuits [7]. In this Letter, we present a further investigation into the use of coreless PCB transformers for power conversion applications. It is found that high power density and energy efficiency can be achieved in coreless PCB transformers.



Fig. 1 Dimension of primary and secondary windings of coreless PCB transformer

In [1], it has been reported that in a core-based microtransformer a transformer power density of 22.4W/cm<sup>2</sup> and transformer efficiency of 61% has been achieved. In this Letter, we study a coreless PCB transformer as shown in Fig. 1. The transformer has two identical coils printed on the opposite side of a double-sided PCB. Each winding has 19 turns and a diameter of ~1cm. The thickness of the PCB that separates the primary and secondary windings is 0.4mm. The equivalent circuit of the transformer is shown in Fig. 2. The external capacitor has a value of 100pF. The circuit parameters are  $L_{lk1} = 0.35595\mu$ H;  $L_{lk2} =$ 0.35595 $\mu$ H;  $L_M = 1.4936\mu$ H. Owing to skin effects, the winding resistance changes with frequency. The measured AC winding resistance of both windings is