PoC.cache.mem

This unit provides a cache (PoC.cache.par2) together with a cache controller which reads / writes cache lines from / to memory. It has two PoC.Mem Interface interfaces:

  • one for the “CPU” side (ports with prefix cpu_), and

  • one for the memory side (ports with prefix mem_).

Thus, this unit can be placed into an already available memory path between the CPU and the memory (controller). If you want to plugin a cache into a CPU pipeline, see PoC.cache.cpu.

Configuration

Parameter

Description

REPLACEMENT_POLICY

Replacement policy of embedded cache. For supported values see PoC.cache_replacement_policy.

CACHE_LINES

Number of cache lines.

ASSOCIATIVITY

Associativity of embedded cache.

CPU_ADDR_BITS

Number of address bits on the CPU side. Each address identifies one memory word as seen from the CPU. Calculated from other parameters as described below.

CPU_DATA_BITS

Width of the data bus (in bits) on the CPU side. CPU_DATA_BITS must be divisible by 8.

MEM_ADDR_BITS

Number of address bits on the memory side. Each address identifies one word in the memory.

MEM_DATA_BITS

Width of a memory word and of a cache line in bits. MEM_DATA_BITS must be divisible by CPU_DATA_BITS.

OUTSTANDING_REQ

Number of oustanding requests, see notes below.

If the CPU data-bus width is smaller than the memory data-bus width, then the CPU needs additional address bits to identify one CPU data word inside a memory word. Thus, the CPU address-bus width is calculated from:

CPU_ADDR_BITS=log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS

The write policy is: write-through, no-write-allocate.

The maximum throughput is one request per clock cycle, except for OUSTANDING_REQ = 1.

If OUTSTANDING_REQ is:

  • 1: then 1 request is buffered by a single register. To give a short critical path (clock-to-output delay) for cpu_rdy, the throughput is degraded to one request per 2 clock cycles at maximum.

  • 2: then 2 requests are buffered by PoC.fifo.glue. This setting has the lowest area requirements without degrading the performance.

  • >2: then the requests are buffered by PoC.fifo.cc_got. The number of outstanding requests is rounded up to the next suitable value. This setting is useful in applications with out-of-order execution (of other operations). The CPU requests to the cache are always processed in-order.

Operation

Memory accesses are always aligned to a word boundary. Each memory word (and each cache line) consists of MEM_DATA_BITS bits. For example if MEM_DATA_BITS=128:

  • memory address 0 selects the bits 0..127 in memory,

  • memory address 1 selects the bits 128..256 in memory, and so on.

Cache accesses are always aligned to a CPU word boundary. Each CPU word consists of CPU_DATA_BITS bits. For example if CPU_DATA_BITS=32:

  • CPU address 0 selects the bits 0.. 31 in memory word 0,

  • CPU address 1 selects the bits 32.. 63 in memory word 0,

  • CPU address 2 selects the bits 64.. 95 in memory word 0,

  • CPU address 3 selects the bits 96..127 in memory word 0,

  • CPU address 4 selects the bits 0.. 31 in memory word 1,

  • CPU address 5 selects the bits 32.. 63 in memory word 1, and so on.

A synchronous reset must be applied even on a FPGA.

The interface is documented in detail here.

Warning

If the design is synthesized with Xilinx ISE / XST, then the synthesis option “Keep Hierarchy” must be set to SOFT or TRUE.

Entity Declaration:

 1  generic (
 2    REPLACEMENT_POLICY : string   := "LRU";
 3    CACHE_LINES        : positive;
 4    ASSOCIATIVITY      : positive;
 5    CPU_DATA_BITS      : positive;
 6    MEM_ADDR_BITS      : positive;
 7    MEM_DATA_BITS      : positive;
 8    OUTSTANDING_REQ    : positive := 2
 9  );
10  port (
11    clk : in std_logic; -- clock
12    rst : in std_logic; -- reset
13
14    -- "CPU" side
15    cpu_req   : in  std_logic;
16    cpu_write : in  std_logic;
17    cpu_addr  : in  unsigned(log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS-1 downto 0);
18    cpu_wdata : in  std_logic_vector(CPU_DATA_BITS-1 downto 0);
19    cpu_wmask : in  std_logic_vector(CPU_DATA_BITS/8-1 downto 0) := (others => '0');
20    cpu_rdy   : out std_logic;
21    cpu_rstb  : out std_logic;
22    cpu_rdata : out std_logic_vector(CPU_DATA_BITS-1 downto 0);
23
24    -- Memory side
25    mem_req   : out std_logic;
26    mem_write : out std_logic;
27    mem_addr  : out unsigned(MEM_ADDR_BITS-1 downto 0);
28    mem_wdata : out std_logic_vector(MEM_DATA_BITS-1 downto 0);
29    mem_wmask : out std_logic_vector(MEM_DATA_BITS/8-1 downto 0);
30    mem_rdy   : in  std_logic;
31    mem_rstb  : in  std_logic;
32    mem_rdata : in  std_logic_vector(MEM_DATA_BITS-1 downto 0)
33    );
34end entity;

See also

PoC.cache.cpu