Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

An ARM7v4 IP Core. Does anyone need it ?

Status
Not open for further replies.

mathswork

Newbie level 5
Newbie level 5
Joined
Jul 26, 2004
Messages
9
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Activity points
94
arm ipcore

Hi, all
I have developed an ARMv4 IP cores. It has not typical ARM bus, but it works well. I need help of anyone to wrap it as a typical IP core, or support any testcase for me. This core is very small: a .v file of less than 2000 lines. Any one can use it, I just need your suggestion to develop it well.

I want to put it to opencores.org, but it was rejected.

This IP core is an ARM clone. It has the same architecture of ARM v4. Its main feature lists:

--Not support coprocessor instructions

--Not support THUMB instruction set

--All interrupts supported

--These instructions are supported, except coprocessor instructions.
ldr;ldrb;str;strb;ldrh;strh;ldrsb;ldrsh;swp;swpb;ldm;stm;b;bx;dp;mult;multl;swi;mrs;msr;

--Little-endian format.

--This IP core is very compact: ASIC area less than 30,000 gates (2 inputs NAND gate).

--The critical path is one that has a 32 bit-32 bit multiplier and 64bit – 64bit adder, which is belong to multiply accumulate long instruction.

--All IP core is only one .v file, which has a short length: less than 2000 lines.

--It can be opened or frozen by asserted one input port “cpu_en” to high level or low, which will help reducing more power. Moreover, if reading from RAM needs more cycles, the IP core could be frozen until data from ram are prepared; if only one bus exists, it also could be frozen until data or instructions are ready.

--A three-stage pipeline is used: fetch, decode, execute. Reading from ram will need more one cycle because when sending address to RAM, data has to be prepared in the next cycle. At that time if data fetched is used as one operand, next instruction has to be abandoned and one spare cycle is occurred.
 

rom_data.bin

thanks for posting.
 

free arm ip core

Very interesting. mathswork do you have any testbench, test ROM files to go with it? What kind of compiler do you use in order to compile from C codes?
 

rom_data.bin download

Hi,kel8157
Of course I have many testbench for it and Each is aimed at single instruction. This was the only way to check it before. But now, I have "keil for arm" produced "Hex" file. I know little to embeded programming, and i have to learn C programing to make it work.

Recently, I have great success. I compiled one example "Blinky" of "keil for arm", and simulated it in modelsim. It works perfectly. The next step, i will download it to FPGA.

If you have "Keil for arm". You can easy find this example in "D:\Keil\ARM\RV30\Examples\Blinky". Before compiling, make "PLL_SETUP EQU 0" and "MAM_SETUP EQU 0" in line 117 and 37 of "startup.s".

I have a version for you to simulate, if you are instresting.

The internal registers, which I name it "reg_r0" to "reg_rf", you can drag it into wave window. And also, you can find "reg_re_usr", "cpsr_i", "cpsr_m".
 

arm7 ipcore

Hi mathswork,

the diagram file in jpg format that you supplied seems to be truncated. Or may be only I experience such a bug?
 

cloning arm core

Ok, i have fixed it. re-download it. Some text to explain it in my blog: h**p://free-arm.blog.163.com
 

keil303a.rar

mathswork said:
Ok, i have fixed it. re-download it. Some text to explain it in my blog: h**p://free-arm.blog.163.com

Hi all,

I couldn't find it coz I dont understand chinese.
Could somebody upload the code here.

Thanks in advance.
 

keil keygen invalid cid

mathswork said:
Hi,kel8157
Of course I have many testbench for it and Each is aimed at single instruction. This was the only way to check it before. But now, I have "keil for arm" produced "Hex" file. I know little to embeded programming, and i have to learn C programing to make it work.

Recently, I have great success. I compiled one example "Blinky" of "keil for arm", and simulated it in modelsim. It works perfectly. The next step, i will download it to FPGA.

If you have "Keil for arm". You can easy find this example in "D:\Keil\ARM\RV30\Examples\Blinky". Before compiling, make "PLL_SETUP EQU 0" and "MAM_SETUP EQU 0" in line 117 and 37 of "startup.s".

I have a version for you to simulate, if you are instresting.

The internal registers, which I name it "reg_r0" to "reg_rf", you can drag it into wave window. And also, you can find "reg_re_usr", "cpsr_i", "cpsr_m".

Read through the code, it's very interesting and thank you for the effort, I believe, by pulling down the cpu_en for a few cycles I am able to implement that 32X32 multiplier with a serial multiplier and which can boost the operating frequency, am I right? I am fitting it to an FPGA.

What is the version of Keil you are using? I tried to download the evaluation software from Keil/ARM but it seems very large and it's unable to generate assembly listings. Are you able to find the armcpp, assembles and link scripts from the Keil software you are using? I want to see whether I can modify it to work with the free GNU-arm system.


As for AdvaRes, I believe the picture is all right now, I can see full pic and upload to imageshack fopr you, hope mathworks won't mind..
**broken link removed**
**broken link removed**:D
 

ip core rom reading rom

Hi,
I am glad to hear that you are paying attention on my arm ip core. Maybe, you are the second person to simulate it I know.

As for 32x32 multiplier, it is the key component in my core. Almost every ARM instruction need it. It is used not only as RmxRs+Rn for MUL instruction, but also Rm>>Rs+Rn for most instructions which has shift registers. There is no barrel shifter in it. When I want to shift Rm left: LSL #5, I will assign Rs = 32'b1_0000, the lower 32-bit of RmXRs will be the result. If I want to get a logical shift right: LSR #5, I will assign Rs = ~5+1=27, the higher 32 bit of RmXRs is what I want.

So this long critical path is not so easy to remove it. In the next version, I will use a 32x8 to replace it. So a MUL or UMLA instruction will take four cycles to implement it. Some mux is needed to implement shift operation using this 32x8 multiplier.

the keil I use is downloaded from :**broken link removed**. It is only 60 MB. If you get you CID through "File"->"License Management"-> CID, input it to "keygen_edge.exe". This version is easy to use.

I am interesting with your fitting into FPGA.
 

cool.. thank you for the effort mathswork.. so far i see two potential improvement, one is the SRAM, I observe the core needs a dual port RAM, is it difficult to recode the core to use a single port SRAM? Second is the multiplier issue, it reduces the critical path to 40ns on Spartan-E..
 

Hi, kel8157
It is very easy to have this core connected with a single port RAM. I am familar with dual port RAM, so this version is made for that. I could modify some lines. It is easy.

The critical path gives me a big problem very much. I think that is why ARM names its core low-power core. They must have no choice to imcrease its frequency. So ARM9 has five-stage pipeline, which has more 2 pipelines than ARM7. Think of that, 40 ns divided by 3, my future "arm9" will have a critical path of 40/3=13 ns.

I have another version. It is based on the former. The former's multiplier is 32x32. I use a 8x8 multiplier to replace it, which means a MUL/SMUAL insturction will need more cycles to carry out than the former. For ASIC, this version is very successful. In SMIC 0.18 um, the former could have more than 15 ns, but this version will run 6~7 ns. But for FPGA, it fails to reduce the critical path significantly. Because a 32X32 multiplier is a dedicated components in FPGA, a 8x8 multipiler + some MUXs is not so better than a 32x32 multiplier.
 

mathswork said:
Hi, kel8157
It is very easy to have this core connected with a single port RAM. I am familar with dual port RAM, so this version is made for that. I could modify some lines. It is easy.

The critical path gives me a big problem very much. I think that is why ARM names its core low-power core. They must have no choice to imcrease its frequency. So ARM9 has five-stage pipeline, which has more 2 pipelines than ARM7. Think of that, 40 ns divided by 3, my future "arm9" will have a critical path of 40/3=13 ns.

I have another version. It is based on the former. The former's multiplier is 32x32. I use a 8x8 multiplier to replace it, which means a MUL/SMUAL insturction will need more cycles to carry out than the former. For ASIC, this version is very successful. In SMIC 0.18 um, the former could have more than 15 ns, but this version will run 6~7 ns. But for FPGA, it fails to reduce the critical path significantly. Because a 32X32 multiplier is a dedicated components in FPGA, a 8x8 multipiler + some MUXs is not so better than a 32x32 multiplier.

Yeah true FPGA is slow.. i compiled it and it can run at 20MHz from a Spartan-3E 1200..
And I think I have to manually modify my own compiled HEX file into the BIN format used in the testbench (My first HEX now worked! :)). Do you use scripts or programs to do this conversion? I know it's trivial to do so manually but it's slow..

I think I want to add some AMBA/external ROM&RAM/UART/DMA/PC communication etc peripherals to learn some skills on Digilent's Nexys2 board.. :D
 

What kind of effort can you put into the ARM core design? In order to implement the core with standard components like ROM & SRAM, the testbench needs to be modified. If I change the interface for ROM/RAM access, do you think you can modify the core to fit in the box? :D



Code:
`define DEL 1
`timescale 1 ns/1 ns
module tb_test;

reg            clk;
reg            rst;
reg            cpu_en;
reg            cpu_restart;
reg            irq;
reg            fiq;

wire		   rom_en;
wire    [31:0] rom_addr;
reg     [31:0] rom_data;
reg            rom_abort;

wire           ram_en;
wire           ram_wr_en;
wire    [31:0] ram_addr;
wire    [31:0] ram_wr_data;
reg     [31:0] ram_rd_data;
reg            ram_rd_abort;

reg [127:0] rom_tmp [2047:0];
reg [7:0] rom_all [32767:0];

arm u_arm (
            .clk           (    clk          ),          //System clock
			.rst           (    rst          ),          //System reset pins, high valid
			.cpu_en        (    cpu_en       ),          //Cpu enable signal, high valid, low level suspends cpu.
			.cpu_restart   (    cpu_restart  ),          //To restart cpu, high valid.
			.irq           (    irq          ),          //IRQ interrupt enable signal, high valid
			.fiq           (    fiq          ),          //FIQ interrupt enable signal, high valid

			.rom_en        (    rom_en       ),          //Instruction rom¡¯s 32-bit address
			.rom_addr      (    rom_addr     ),          //Instruction rom¡¯s 32-bit address
			.rom_data      (    rom_data     ),          //Instruction stored in rom
			.rom_abort     (    rom_abort    ),          //This instruction is invalid if this signal keeps high.

			.ram_en        (    ram_en       ),          //Ram read enable signal, low=select
			.ram_wr_en     (    ram_wr_en    ),          //Ram write enable signal, low=write, high=read
			.ram_addr      (    ram_addr     ),          //Ram read address
			.ram_wr_data   (    ram_wr_data  ),           //Ram write data signals.
			.ram_rd_data   (    ram_rd_data  ),          //Ram read data signals
			.ram_rd_abort  (    ram_rd_abort )          //Data on ¡°ram_rd_data¡± is invalid if it keeps high
             );

initial begin
clk = 1'b0;
cpu_en = 1'b0;
cpu_restart = 1'b0;
rom_abort = 1'b0;
irq = 1'b0;
fiq = 1'b0;
rst = 1'b0;
#10 rst = 1'b1;
#20 rst = 1'b0;
cpu_en = 1'b1;
cpu_restart = 1'b1;
#10 cpu_restart = 1'b0;

end

always clk = #5 ~clk;

// ROM section, need to use an ENAble signal, which easier to work with flash or PROM.
// The read from ROM code when ram_addr[31:28]==4'h0 need modification,
// otherwise arbitration with ROM must be implemented.
always @ (posedge clk) begin
	if (rom_en) begin
	    rom_data <= #`DEL { rom_all[rom_addr+2'd3],rom_all[rom_addr+2'd2],rom_all[rom_addr+2'd1],rom_all[rom_addr]};
	else if ( ram_addr[31:28]==4'h0 )
		ram_rd_data_from rom <= #`DEL { rom_all[ram_addr+2'd3],rom_all[ram_addr+2'd2],rom_all[ram_addr+2'd1],rom_all[ram_addr]};
	end
end


// RAM section, using standard single port RAM.
reg [7:0] ram_data [2047:0];
integer i;
initial begin
ram_rd_abort = 1'b0;
for ( i=0; i<2048;i=i+1 )
    ram_data[i] = 8'h0;
end

always @ ( posedge clk ) begin
	if ( ram_en & (ram_wr_en == 1'b1)) begin
	    if ( ram_addr[31:28]==4'h4 )
	    	ram_rd_data <= #`DEL { ram_data[ram_addr[10:0]+3],ram_data[ram_addr[10:0]+2],ram_data[ram_addr[10:0]+1],ram_data[ram_addr[10:0]]};
		else if ( ram_addr[31:28]==4'h0 )	// If this read is moved into ROM interface, it's easier for RAM.
			ram_rd_data <= #`DEL ram_rd_data_from;
		else;
	end

	if ( ram_en & (ram_wr_en == 1'b0) & ( ram_addr[31:28]==4'h4 ) ) begin
	    ram_data[ram_addr[10:0]+3] <= #`DEL ram_wr_data[31:24];
	    ram_data[ram_addr[10:0]+2] <= #`DEL ram_wr_data[23:16];
	    ram_data[ram_addr[10:0]+1] <= #`DEL ram_wr_data[15:8];
	    ram_data[ram_addr[10:0]]   <= #`DEL ram_wr_data[7:0];
	end
end


/**************************************************************/




parameter  memLoadFile = "./data_test/keil_03.bin";

integer n, j;
reg [127:0] tmp;
initial begin
if (memLoadFile != "") begin
	$readmemh(memLoadFile, rom_tmp);	// To use this, copy the HEX section and fill vacant bytes in last row with xx
	for (n=0; n<2048;n=n+1) begin
		tmp = rom_tmp[n];
    	rom_all[n*16+15] = tmp[07:00];
    	rom_all[n*16+14] = tmp[15:08];
    	rom_all[n*16+13] = tmp[23:16];
    	rom_all[n*16+12] = tmp[31:24];
    	rom_all[n*16+11] = tmp[39:32];
    	rom_all[n*16+10] = tmp[47:40];
    	rom_all[n*16+9 ] = tmp[55:48];
    	rom_all[n*16+8 ] = tmp[63:56];
    	rom_all[n*16+7 ] = tmp[71:64];
    	rom_all[n*16+6 ] = tmp[79:72];
    	rom_all[n*16+5 ] = tmp[87:80];
    	rom_all[n*16+4 ] = tmp[95:88];
    	rom_all[n*16+3 ] = tmp[103:96];
    	rom_all[n*16+2 ] = tmp[111:104];
    	rom_all[n*16+1 ] = tmp[119:112];
    	rom_all[n*16+0 ] = tmp[127:120];
    	$display("IN  %h", tmp);
    	$display("OUT %h%h%h%h%h%h%h%h%h%h%h%h%h%h%h%h",
    	rom_all[n*16+0 ],
    	rom_all[n*16+1 ],
    	rom_all[n*16+2 ],
    	rom_all[n*16+3 ],
    	rom_all[n*16+4 ],
    	rom_all[n*16+5 ],
    	rom_all[n*16+6 ],
    	rom_all[n*16+7 ],
    	rom_all[n*16+8 ],
    	rom_all[n*16+9 ],
    	rom_all[n*16+10],
    	rom_all[n*16+11],
    	rom_all[n*16+12],
    	rom_all[n*16+13],
    	rom_all[n*16+14],
    	rom_all[n*16+15]);
	end
//    $readmemh(memLoadFile, rom_all);
end
end



endmodule
 

kel8157,
hi, I could do that.
Attatched is some versions I did recently.

Any problem please mail me.
 

mathswork said:
kel8157,
hi, I could do that.
Attatched is some versions I did recently.

Any problem please mail me.

cool.. that's very good work.. the arm_sp.v and little_arm_sp.v fits my need nicely. I will try them out when I am back from work.
 

Hi, kel8157,

I have downloaded it into my fpga board: digilent's spartan-3e starter kit.

I use UART to download bin file to ROM, and could see it works immediately. That means FPGA board become an ARM developping board.

The new core has improved more. The critical path is only 26 ns(before is 40 ns ).

How about you ? need help?
 

mathswork said:
Hi, kel8157,

I have downloaded it into my fpga board: digilent's spartan-3e starter kit.

I use UART to download bin file to ROM, and could see it works immediately. That means FPGA board become an ARM developping board.

The new core has improved more. The critical path is only 26 ns(before is 40 ns ).

How about you ? need help?

cool.. I am in the middle of building a small system on Digilent's S-3E 1200. progress has been slow with me though.
 
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top