PoC.cache.mem
This unit provides a cache (PoC.cache.par2) together with a cache controller which reads / writes cache lines from / to memory. It has two PoC.Mem Interface interfaces:
one for the “CPU” side (ports with prefix
cpu_
), andone for the memory side (ports with prefix
mem_
).
Thus, this unit can be placed into an already available memory path between the CPU and the memory (controller). If you want to plugin a cache into a CPU pipeline, see PoC.cache.cpu.
Configuration
Parameter |
Description |
---|---|
REPLACEMENT_POLICY |
Replacement policy of embedded cache. For supported values see PoC.cache_replacement_policy. |
CACHE_LINES |
Number of cache lines. |
ASSOCIATIVITY |
Associativity of embedded cache. |
CPU_ADDR_BITS |
Number of address bits on the CPU side. Each address identifies one memory word as seen from the CPU. Calculated from other parameters as described below. |
CPU_DATA_BITS |
Width of the data bus (in bits) on the CPU side. CPU_DATA_BITS must be divisible by 8. |
MEM_ADDR_BITS |
Number of address bits on the memory side. Each address identifies one word in the memory. |
MEM_DATA_BITS |
Width of a memory word and of a cache line in bits. MEM_DATA_BITS must be divisible by CPU_DATA_BITS. |
OUTSTANDING_REQ |
Number of oustanding requests, see notes below. |
If the CPU data-bus width is smaller than the memory data-bus width, then the CPU needs additional address bits to identify one CPU data word inside a memory word. Thus, the CPU address-bus width is calculated from:
CPU_ADDR_BITS=log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS
The write policy is: write-through, no-write-allocate.
The maximum throughput is one request per clock cycle, except for
OUSTANDING_REQ = 1
.
If OUTSTANDING_REQ
is:
1: then 1 request is buffered by a single register. To give a short critical path (clock-to-output delay) for
cpu_rdy
, the throughput is degraded to one request per 2 clock cycles at maximum.2: then 2 requests are buffered by PoC.fifo.glue. This setting has the lowest area requirements without degrading the performance.
>2: then the requests are buffered by PoC.fifo.cc_got. The number of outstanding requests is rounded up to the next suitable value. This setting is useful in applications with out-of-order execution (of other operations). The CPU requests to the cache are always processed in-order.
Operation
Memory accesses are always aligned to a word boundary. Each memory word (and each cache line) consists of MEM_DATA_BITS bits. For example if MEM_DATA_BITS=128:
memory address 0 selects the bits 0..127 in memory,
memory address 1 selects the bits 128..256 in memory, and so on.
Cache accesses are always aligned to a CPU word boundary. Each CPU word consists of CPU_DATA_BITS bits. For example if CPU_DATA_BITS=32:
CPU address 0 selects the bits 0.. 31 in memory word 0,
CPU address 1 selects the bits 32.. 63 in memory word 0,
CPU address 2 selects the bits 64.. 95 in memory word 0,
CPU address 3 selects the bits 96..127 in memory word 0,
CPU address 4 selects the bits 0.. 31 in memory word 1,
CPU address 5 selects the bits 32.. 63 in memory word 1, and so on.
A synchronous reset must be applied even on a FPGA.
The interface is documented in detail here.
Warning
If the design is synthesized with Xilinx ISE / XST, then the synthesis option “Keep Hierarchy” must be set to SOFT or TRUE.
Entity Declaration:
1 generic (
2 REPLACEMENT_POLICY : string := "LRU";
3 CACHE_LINES : positive;
4 ASSOCIATIVITY : positive;
5 CPU_DATA_BITS : positive;
6 MEM_ADDR_BITS : positive;
7 MEM_DATA_BITS : positive;
8 OUTSTANDING_REQ : positive := 2
9 );
10 port (
11 clk : in std_logic; -- clock
12 rst : in std_logic; -- reset
13
14 -- "CPU" side
15 cpu_req : in std_logic;
16 cpu_write : in std_logic;
17 cpu_addr : in unsigned(log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS-1 downto 0);
18 cpu_wdata : in std_logic_vector(CPU_DATA_BITS-1 downto 0);
19 cpu_wmask : in std_logic_vector(CPU_DATA_BITS/8-1 downto 0) := (others => '0');
20 cpu_rdy : out std_logic;
21 cpu_rstb : out std_logic;
22 cpu_rdata : out std_logic_vector(CPU_DATA_BITS-1 downto 0);
23
24 -- Memory side
25 mem_req : out std_logic;
26 mem_write : out std_logic;
27 mem_addr : out unsigned(MEM_ADDR_BITS-1 downto 0);
28 mem_wdata : out std_logic_vector(MEM_DATA_BITS-1 downto 0);
29 mem_wmask : out std_logic_vector(MEM_DATA_BITS/8-1 downto 0);
30 mem_rdy : in std_logic;
31 mem_rstb : in std_logic;
32 mem_rdata : in std_logic_vector(MEM_DATA_BITS-1 downto 0)
33 );
34end entity;
See also