#### Outline:

MANCHEstER

1824

- O memory hierarchy basics
- O on-chip RAM and caches
- O memory management
- O operating systems

hands-on: C and assembly code interworking

#### Outline:

MANCHEstER

1824

#### memory hierarchy basics

O on-chip RAM and caches

O memory management

O operating systems

hands-on: C and assembly code interworking



**MANCHEstER** 

- A typical system has several different memory subsystems:
  - O processor registers: ~100 bytes, 1 ns
    - access is a small part of a clock cycle
  - O on-chip cache or RAM: ~10 Kbytes, 5 ns
    - accessed at the processor clock rate
  - O off-chip ROM and RAM: ∼ Mbytes, 50 ns
    - access costs several processor cycles
  - O backup store: ~ Gbytes, 5 ms



• There may be more or fewer (or no) levels of cache

MANCHEstER

1824

- Efficient operation depends on:
  - the right things
    - the code and data
  - O being in the right place
    - the on-chip memory or processor registers
  - O at the right time
    - when they are in use

MANCHEstER

1824

- Processor registers
  - O are managed directly by the compiler
- Cache
  - is managed automatically by the hardware
- On-chip RAM
  - O is managed by the programmer
- Off-chip RAM
  - O is managed by the operating system

**MANCHEstER** 

**MANCHEstER** 

- □ The objective is to approach:
  - O the performance of the fastest memory ...
  - O ... at the cost/bit of the slowest memory
- Feasible because programs display:
  - temporal locality
    - accesses to a location are clustered in time
  - spatial locality
    - accesses are clustered in the address space

#### Outline:

MANCHEstER

1824

- O memory hierarchy basics
- on-chip RAM and caches
- O memory management
- O operating systems

hands-on: C and assembly code interworking



#### **On-chip RAM**

- System benefits of on-chip memory:
  - increased performance no wait states
  - O reduced power consumption

• improved EMC

- On-chip RAM ("Tightly Coupled Memory") is used in preference to a cache in some embedded systems:
  - it is simpler, cheaper and uses less power
  - O its behaviour is more deterministic
  - O however it requires explicit management



#### Caches

- A cache is a small on-chip memory which automatically:
  - O keeps copies of recently used memory values
  - O supplies these to the processor when it asks for them again
    - thereby avoiding an off-chip memory access
  - O decides which values to over-write when it is full



### **Cache organization**

- □ There are many ways to arrange a cache:
  - O separate or mixed instructions and data?
  - how much memory should be loaded on a cache miss?
  - O how flexible should the allocation of cache space be?
  - O how should writes be handled?

## Unified instruction and data cache



# **Separate data and instruction caches**





#### **Direct-mapped cache**



### 2-way set associative cache

MANCHEstER

1824

- two (smaller) cache blocks
- two chances to store any line
- O better hit rate
- O more expensive

 can extend to 4-way, etc.





#### Fully associative cache

The University of Manchester more places to store given line tag CAM even better hit rate even more expensive ○ (potentially) slower O requires CAM (Content Addressable Memory)



#### MANCHEstER **Cache architecture comparison**

- Direct mapped
  - simple, cheap, fast
  - subject to 'thrashing'
  - choice for large caches
- O Set associative
  - compromise
  - may be 2-, 4-, 8-, etc. way
  - often preferred
- Fully associative
  - best hit rate
  - slow, expensive
  - choice for small caches

1824



#### **Cache write strategies**

#### Write-through

- all data is written to memory; matching cache locations are updated
- Write-through with write buffer
  - all data is written to memory, but the write is performed through a buffer
- Copy-back
  - the processor writes to the cache main memory is only updated on flushes.



### **Cache power-efficiency**

- What is the influence of organization on powerefficiency?
  - a high hit rate minimizes off-chip activity
    - hit rate increases with associativity (up to 4)
  - O set-associative caches burn more power
    - due to the increased number of active sense amplifiers
  - CAM (in fully associative caches) is also power-hungry



□ How can cache power-efficiency be improved?

- Juse serial tag and data accesses in a set associative cache
  - enable only the relevant data RAM
- Segment the CAM in a fully associative cache
- exploit sequential address sequences

MANCHEstER

1824

#### Outline:

MANCHEstER

1824

- O memory hierarchy basics
- O on-chip RAM and caches
- memory management
- O operating systems

hands-on: C and assembly code interworking



#### Memory management

#### Allows each program to run in its own address space

- O using *address translation* 
  - greatly simplifying programming
- protects programs from other programs
  - O using memory protection
    - improving system reliability
- supports the memory hierarchy
  - between the off-chip RAM and the disc

#### **Address translation**

#### Translates

- the processor's *logical* (or '*virtual*') address ...
- ... into the *physical* memory address
- There are two main schemes:
  - Segmented memory management
    - variable size (usually large) segments
  - o paged memory management
    - fixed size pages, usually around 4 Kbytes

#### MANCHEster Segmented memory management



### Paged memory management

The University of Manchester

MANCHEstER 1824

#### Three stage look-up





#### **Address translation**

- Segmentation schemes generally keep all relevant registers on chip
- Paging schemes have too much translation data to keep on chip
  - three memory accesses per translation would be too slow!
  - therefore a cache of recently-used translations is kept on-chip
    - a translation look-aside buffer (TLB)

#### **Translation look-aside buffer**



MANCHEstER 1824

### Virtual and physical caches

- A cache may use virtual (pre-MMU) or physical addresses
  - O physical caches
    - have fewer coherency problems
    - require a translation on every access
  - O virtual caches
    - must be flushed whenever the translation tables change (e.g. on a process switch)
    - do not support synonyms
    - when two virtual addresses map to the same physical address

MANCHEstER

1824

#### Outline:

**MANCHEstER** 

1824

- O memory hierarchy basics
- O on-chip RAM and caches
- O memory management
- operating systems

hands-on: C and assembly code interworking



### **Operating systems**

#### Scheduling

- an operating system allows a machine to run multiple programs concurrently
  - sometimes owned by different users
- Protection
  - each program is protected from errors in others (to a greater or lesser extent)
- Resource allocation
  - Imited resources, conflicting demands

### **Operating systems**

- Hardware support is needed:
  - O memory management hardware
  - a protected operating system mode
    - to prevent unauthorized access to the MMU
- Single-user systems
  - O do not need protection from malicious programmers!
    - except, possibly, virus and network attacks
  - O use protection to improve reliability

**MANCHEstER** 

### **Operating systems**

- Embedded systems
  - Usually run a fixed set of programs
    - these can run in a single memory space
    - MMU hardware is often not needed
  - Sometimes use a small Real Time Operating System (RTOS) for:
    - scheduling
    - hardware resource management
  - O sometimes run a single program
    - no MMU; simple monitor operating system

**MANCHEstER** 



# Hands-on: C and assembly code interworking

#### Closer look at the APCS

• see how assembly programs may be called from a C program

O look at various ways the stack may be used

Follow the 'Hands-on' instructions