If you're storing your program data in slow flash, then caching is a very good idea.
I can't understand the following sentence at all: "cache read the memory also need 33ns, after go to cache then CORE read from cache again... it is seems like meaningless". Can you rephrase that?
Ideally, a cache should work transparently. The first time you access memory that isn't cached (a miss), a whole page (of some arbitrary length) will be retrieved and copied somewhere to the BRAM cache. If your requirements are modest, you could just cache a single page at a time. If you want to be fancy, you could cache a number of different non-contiguous pages.
For a working implementation, you could have a look at the aeMB2 soft processor, though the code is not at all well documented. There should be plenty of other references and books that deal with the topic, though. This page was one of the first Google hits I found.
Let say FlashMEM -- 33ns ---> cache --0.xns --> Processor , if we din't implement cache, FlashMEM -- 33ns --> processor , is faster right? Since cache also need to read from FlashMEM before to processor, why not just ignore cache and direct read from FlashMEM to processor? (this is my meaningless's means)
So do cache is necessary here?
Besides using faster FlashMEM, any other method to make the system running in 100MHz? (30MHz is too slow ><)
Not quite. You can your processor at a different speed to the program memory - if the processor is faster than the memory, you will need to introduce a wait state.
So let's say your processor is running with a 10ns clock period and that fetching a page of, say, 128 instructions from flash and storing it in a cache takes 50ns.
Clock, Time Elapsed, State
0 0 ns Fetch instruction 0 - detect cache miss - fetch from flash
1 10 ns Wait state
2 20 ns Wait state
3 30 ns Wait state
4 40 ns Wait state
5 50 ns Execute instruction 0. Fetch instruction 1 - hit from cache
6 60 ns Execute instruction 1. Fetch instruction 2 - hit from cache
7 70 ns Execute instruction 2. Fetch instruction 3 - hit from cache
8 80 ns Execute instruction 3, a branch to instruction 500. Fetch instruction 500 - detect cache miss - fetch from flash
9 90 ns Wait state
10 100 ns Wait state
11 110 ns Wait state
12 120 ns Wait state
13 130 ns Execute instruction 500, etc.
So the processor always runs at a high clock speed, but stalls for a number of cycles while waiting for the cache to be filled. As the Wikipedia article states, you can improve performance by trying to predict when you'll need to load data into the cache, and do it in advance.
A simple example - if you're executing a long sequence of statements, from 0 to 127, the processor might anticipate that you'll fetch 128 and 129 after that, so it might start loading the next page in advance. This is possible if your cache is comprised of multiple independent sections.
---------- Post added at 13:37 ---------- Previous post was at 13:30 ----------
By the way, the XC3S1200E has 504 Kbits of BRAM, which is enough for a modest cache (even the 80486 only had as much as 16 KByte (128 Kbit) of on-chip cache.
To run Linux, you will need to use some external memory. I don't think it'd be possible to get away with less than 8 MB of RAM unless perhaps you used ancient versions of the kernel and userland.
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?