Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

help req for 16bit fpgacpu, organization and arch problems

Status
Not open for further replies.

umairsiddiqui

Full Member level 2
Joined
Apr 13, 2004
Messages
143
Helped
7
Reputation
14
Reaction score
1
Trophy points
1,298
Location
Sweden
Activity points
1,434
syscon wishbone design

while trying to make a 16-bit CPU core, I'm facing a dead-on arrival situation!
CPU is required to drive mini-UART (by Mr. Ovidiu Lupas, OpenCores.org) on spartan-3 starter kit
(to do something ??? - printing "hello world" :-( ).
for this system, I was thinking that CPU on power-up/reset draw the user-data/code from
platform flash and load into SRAM.

Alas! my comp-arch and digital design knowledge proved to be worse than I thought.
please help me in further design simplification and any organizational/arch flaw removal, so I end up with something simple and something which
could be called "general purpose CPU" - able to do some data processing and controlling...drive a system like
system6801 (by Mr. John Kent).

after wastage of a lot of time, I come up with:
-16 bit datapath
-64k flat addressable memory (for both code and data) without paging and segmentation
-using word(16bit) addresses, sequential memory words differ by 1 rather than 2.
-16 interrupts
-separate I/O address space (256 locs)
-4k stack
-user visible registers:
*Accumulator (A) - 16bit
*Base/Index (B) - 16bit
*loop Counter(C) - 16bit
*Stack Pointer(SP) - 12bit
*Stack Frame Pointer(FP) - 12bit
-overall memory map:
*first 16 locs for interrupts - int 0 for reset
*last 4k for stack (first 4bits of SP and FP are '1111')
*user code/data and memory mapped I/O in between
-addressing modes:
*register (example: mov A,C)
*direct (example: mov A,[16bit absolute address];
*indirect(example: mov A,)
*index/base (example: mov A,[16bit displacement];
*immediate
*within stack (example - accessing args and locals variables: mov A,[fp + 1]; there is no "segment overriding",
sp and fp are just for stack management)

-jumps, conditional jumps and call are all absolute and supporting all addressing modes (except stack!)
-no multiplication, division and barrel shifting


although the whole processor seem to be a problem but:
-a flat addressable 64k*16 RAM
-"using word(16bit) addresses, sequential memory words differ by 1 rather than 2"
-fixed stack (I'm so nervous - I could not find datasheets,arch and asm docs for 6502, 6800, 6809 and 68k)
-stack size and size of SP and FP, I was trying to avoid stack and code overlap
-index/base addressing mode is necessary for array/lut processing (but complicating design), any alternative?


I like KISS methodology but my experience is limited to 8088, 8051...

Reducing datapath to 8-bit may beautify the design...wide data paths require elegant solutions, which I can't afford!
 

Re: help req for 16bit fpgacpu, organization and arch proble

that seems a cool architecture, but I couldn't find the question, so I suspect you wanna us to pray for you... :)

look, these are my three cents:
- I'd simply use a flat addresasble 64K x 8-bit memory space
- I'd hate any CPU without base/index adressing capabilities
- Maybe some day you'll need a bigger stack, so let it overlap the code

good luck
 

Require comments on given *modified* instruction set (cheated from many areas - including Hennessy & Patterson comp arch book, i.e. after lot of learning! :sad: )for datapath *simplificaton* and less clock-cylces :

-number of registers, r0 utilization

-size of addressable memory

-dealing unaligned bytes?

-addressing modes, especially implementation of register-indirect addressing and
absolute addressing

-recommended "displacement" (no. of bits) for data(base-displacement loading) and code(jumps)? should both be equal(keeping *simplificaton* in mind)?

-"good algorithm(s)" for assigning opcodes

-any (additional) instruction for bus and cpu control (DMA not required)

-any other "highly recomended" instruction/feature...

thankyou...

///////////////////////////////////////////////////////////////////////////////////////

1) Total 64k addressable memory: 2^16 (0 to 2^16 - 1) linear and byte-addressable
memory locations
2) 16 general purpose registers (r0, r1 ... r15) and separate sp

3) Memory mapped I/O
4) 16-bit PC
5) FLAGS[00000000000,CF,OF,SF,ZF,IF]
6) 32 vector interrupts (0-arithmatic overflow,1-invalid instruction, 2- alignment error, rest masked)
7) support 8-bit operands for loading/storing
8 ) 16-bit operands aligned on even boundary
9) Addressing modes: register addressing, base displacement addressing (could be implemented by r0 as offset in base displacement addressing),
immediate addressing, register-indirect addressing and absolute addressing
10) All instructions should be either 16/32-bits (16-bit prefered)
11) Caching, Pipelining, Paging and DMA not required


Instruction

Mov reg16,reg16

Lw / Mov reg16,[reg16]

Lw / Mov reg16,[reg16+r0]

Sw / Mov [reg16],reg16

Sw / Mov [reg + r0], reg16


Lwi / Mov reg16,imm16 -- imm16 is only loaded in reg16

Lbu / Movzx reg16,[reg16]

Lbu / Movzx reg16, [reg16 + r0]

Sbu / Movzx [reg16],reg16

Sbu / Movzx [reg16+r0],reg16

Lb / Movsx reg16,[reg16]

Lb / Movsx reg16, [reg16 + r0]

Sb / Movsx [reg16],reg16

Sb / Movsx [reg16+r0],reg16



Push reg16

Pop reg16

Pushf

Popf

Add reg16,reg16

Add reg16,imm16

Sub reg16,reg16

Sub reg16,imm16


Adc reg16,reg16

Adc reg16,imm16


Sbb reg16,reg16

Sbb reg16,imm16


Inc reg16

Dec reg16


Cmp reg16,reg16

Cmp reg16,imm16


Test reg16,reg16

Test reg16,imm16


And reg16,reg16

And reg16,imm16


Or reg16,reg16

Or reg16,imm16


Xor reg16,reg16

Xor reg16,imm16


Not reg16


Shl/Sal reg16,imm4

Shr reg16,imm4

Sar reg16,imm4

Rol reg16,imm4

Ror reg16,imm4

Rcl reg16,imm4

Rcr reg16,imm4



Call [reg16]

Call [reg16 + r0]



Ret



Int imm5

Into - overflow int will be called if OF=1



Iret



Jmp [reg16]



Je/Jz disp -- zf=1

Jl/Jnge disp -- sf<>of

Jle/Jng disp -- zf=1, sf<>of

Jb/Jnae disp -- cf=1

Jbe/Jna disp -- cf=1,zf=1

Jo disp -- of=1

Js disp -- sf=1

Jne/Jnz disp -- zf=0

Jnl/Jge disp -- SF=OF

Jnle/Jg disp -- ZF=0, SF=OF

Jnb/Jae disp -- Cf=0

Jnbe/Ja disp -- if CF=0,ZF=0

Jno disp -- if OF=0

Jns disp -- if SF=0



Clc

Cmc

Stc


Sti
Cli


Nop

Hlt

////////////////////////////////////////////////////////////////////////////////////
 

Re: help req for 16bit fpgacpu, organization and arch proble

Push reg16

Hi,
The instuction set looks prety good.
But I dont understand the need for following push and pop operations.

Push [reg16]
Push (addr)
Push [reg16+disp]

Pop [reg16]
Pop (addr)
Pop [reg16+disp]

Instead of these I think

push reg16
pop reg16
push all_reg16
pop all_reg16
pushf
popf

are sufficient!
 

Re: help req for 16bit fpgacpu, organization and arch proble

Push and pop commands could be avoided and the entire architecture could be much enhanced just by adding the pre-decrement and post-increment addressing modes.
Also specialized register functions like counters, base address registers and so on, don't simplify your task. I usually prefer RISC architecture as it simplifies hardware and software concepts.
Have you ever seen the programmers model and the instruction set of the ARM processor? Take a look at it. It has some neat concepts like conditional execution within an opcode, skip commands, and so on.
Rgds /yego
 

First of all thanks for taking interest…its really boosting my morale!

The goal of my project was to design cpu soft IP and simulate it, letter on is was constrained by the supervisor that it should be (at least) 16-bit and also place & route on FPGA and use starter kit resources. As for application he also require some "interactivity", so that it appear that there is some cpu inside. He is not satisfied with idea of displaying hello world on hyperterminal. Idea came from Mr. John Kent's System09 (available on opencores) give instruction interactively then press 'g' to execute them…

All this stuff is new and overwhelming…need to complete it at end of april2005

kindly suggest a lean and mean app

Keeping story in mind, Embedded RISCs do use 32-bit flat addressing, which is nice but addition logic for application is required (have not yet studied NIOS-II, MICROBLAZE but they are definitely big).

if you simply scale down (with "aspect ratio" 100%) RISC arch to 16-bit, you will end up with 64K space and less registers

yes it depend on app, but in general RISC instructions are simple but donot provide good code density (consider 64K – along with memory mapped I/O) and variable size instructions add complexity. 2 operands instructions rather than 3 reduce some code size. Implementing operation in multiple instruction increase the code size. Thumb and MIP16 are not complete archs but special mode…is there 16-bit RISC arch, or core with operations, previously stated on my EDABOARD posts.(additional addressing are not required)

therefore simple and compact instruction encoding is required with optionally some technique to increase address space (for example code banking)

kindly suggest a technique for rapid prototyping, specifications are made after labor, then thrown away with out testing (for example several instruction encoding are coming in mind, but I'm just speculating), bus and timing have not specified yet.

guide me about Testing strategy for such design…(DST and BIST ;-) ), Ok at least to design a working ip core and system…

What would you write, in each of these reports (15-20 pages)?
1) Analysis & Design Report ("detail analysis of problem", "High level detail design", "research methodology", "implementation and testing strategy")
2) Initial Design and Implementation Document
3) Final Design Document
4) Final Project Report

There are 2 ARM books on EDABOARD, and reading "ARM SoC Arch" but I could not find offical arm instruction set and programming model
(downloading big books is also problem on slow connection)
 

Re: help req for 16bit fpgacpu, organization and arch proble

Download McGraw.Hill.VHDL.Programming.by.Example.4th.Ed.pdf from here i.e EDA board. This book is all about designing a 16bit CPU in VHDL simulating it and implementing it on an altera FPGA. I used this base 16 bit CPU to implement my own 32 bit CPU with VGA and keyboard.
 

For processor bus architecture, i was using OpenCores WishBone Spec (B.3) -- WB classic Bus Cycles, to get the
pheripharal support from the opensource cores at OpenCores.

The datapath of the CPU is complete, problem i'm facing in controller design. mainly due to reset and interrupt mechanisms.

3.1.1 Reset Operation
All hardware interfaces are initialized to a pre-defined state. This is accomplished with the reset
signal [RST_O] that can be asserted at any time. It is also used for test simulation purposes by
initializing all self-starting state machines and counters which may be used in the design. The
reset signal [RST_O] is driven by the SYSCON module. It is connected to the [RST_I] signal on
all MASTER and SLAVE interfaces. Figure 3-1 shows the reset cycle.

RULE 3.00
All WISHBONE interfaces MUST initialize themselves at the rising [CLK_I] edge following the
assertion of [RST_I]. They MUST stay in the initialized state until the rising [CLK_I] edge that
follows the negation of [RST_I].
RULE 3.05
[RST_I] MUST be asserted for at least one complete clock cycle on all WISHBONE interfaces.
PERMISSION 3.00
[RST_I] MAY be asserted for more than one clock cycle, and MAY be asserted indefinitely.
RULE 3.10
All WISHBONE interfaces MUST be capable of reacting to [RST_I] at any time.
RULE 3.15
All self-starting state machines and counters in WISHBONE interfaces MUST initialize themselves
at the rising [CLK_I] edge following the assertion of [RST_I]. They MUST stay in the
initialized state until the rising [CLK_I] edge that follows the negation of [RST_I].

RECOMENDATION 3.00
Design SYSCON modules so that they assert [RST_O] during a power-up condition. [RST_O]
should remain asserted until all voltage levels and clock frequencies in the system are stabilized.
When negating [RST_O], do so in a synchronous manner that conforms to this specification.


SUGGESTION 3.00
Some circuits require an asynchronous reset capability. If an IP core or other SoC component
requires an asynchronous reset, then define it as a non-WISHBONE signal. This prevents confusion
with the WISHBONE reset [RST_I] signal that uses a purely synchronous protocol, and
needs to be applied to the WISHBONE interface only.
OBSERVATION 3.20
All WISHBONE interfaces respond to the reset signal. However, the IP Core connected to a
WISHBONE interface does not necessarily need to respond to the reset signal.


....


3.1.5 Use of TAG TYPES
The WISHBONE interface can be modified with user defined signals. This is done with a technique
known as tagging. Tags are a well known concept in the microcomputer bus industry.
They allow user defined information to be associated with an address, a data word or a bus cycle.
All tag signals must conform to set of guidelines known as TAG TYPEs. Table 3-1 lists all of
the defined TAG TYPEs along with their associated data set and signal waveform. When a tag is
added to an interface it is assigned a TAG TYPE from the table. This explicitly defines how the
tag operates. This information must also be included in the WISHBONE DATASHEET.

Table 3-1. TAG TYPEs.
+----------------+-------------------+----------------------------+
|MASTER |
+----------------+-------------------+----------------------------+
|Description |TAG TYPE |Associated With |
+----------------+-------------------+----------------------------+
|Address tag |TGA_O() |ADR_O() |
+----------------+-------------------+----------------------------+
|Data tag, input |TGD_I() |DAT_I() |
+----------------+-------------------+----------------------------+
|Data tag, output| TGD_O() |DAT_O() |
+----------------+-------------------+----------------------------+
|Cycle tag |TGC_O() |Bus Cycle |
+----------------+-------------------+----------------------------+

RULE 3.70
All user defined tags MUST be assigned a TAG TYPE. Furthermore, they MUST adhere to the
timing specifications given in this document for the indicated TAG TYPE.
PERMISSION 3.45
While all TAG TYPES are specified as arrays (with parenthesis ‘()’), the actual tag MAY be a
non-arrayed signal.

....

Q should i use two reset pins: (1) async reset (2) RST_I (complying with WB b.3)
and what would be optimal requirement for async reset (e.g the reset pin should be held high atleast 50us (8086) etc)

Q All WISHBONE interfaces respond to the reset signal. However, the IP Core connected to a
WISHBONE interface does not necessarily need to respond to the reset signal. ???? (surely at least processor
in SoC need to respond the RST_I)

Q should interrupt pin be level sensitive or edge sensitive (note interrupt is not NMI)

Q for 25MHz design, a synchronizer of 3 cascadded DFF should be sufficient???

note: interrupt cycle is similar to intel 8088 and interrupt ack is of type TGC_O() .
 

sir my simple microprocessor is about to complete. for simulation on modelsim, i'm
thinking of test-bench containing sram(behavioural model) which able to read text files, along with processor.
the sram model load the code in binary form and after reset the processor executes the code...


(first of all, whether this type of test bench `environment' is enough?????)


sir, my processor donot contain any pipelining, cache and complex memory management operation.

i mainly need to test: correct instruction functioning(control unit), read/write and interrupt cycles, and number crunching
capabilities....

plz suggest some algos/test-code (e.g quick sort, block transfer)
which i will load in sram model, plz note that processor addressable
memory is 64k and i'm using modelsim free version.
 

Sir,
I'm confused, how to present the modelsim simulation results in poster session.
other contents like introduction, instruction set, bus cycles, programming conventions,
toplevel, datapath, control unit along with their figures will be inserted in the poster
as presented in report. but simulation results can be better presented on computer screen
rather than drawing/pasting on poster.

sir please provide you suggestions, also i'm searching for student posters on
related projects, if you find any thing related to issue please send its link.

note: cpu code and relevent documentation of "HPC-16 project" is posted at **broken link removed**
 

Kindly help with this simulation issue...............
:cry::cry::cry:
I converted this program into binary and loaded into
RAM model. for this simulation i used "test2.vhd" (see code at **broken link removed**),

################################
## this program test:
## mov, li and hlt instructions
################################
0:li r0, 5555h
4:mov r1, r0
6:mov sp, r1
8:mov r5, sp
10:hlt
##############################

all instructions are executed correctly in simulation...

in "test2.vhd", the testbench for this simulation
contain only ram (ramNx16.vhd) and cpu.

since there is no need for wait states, i'have just connected the
the cpu "stb_o" port to cpu "ack_i"...

in figure consider the signal
"/test2/cpu/control/rst_i"
"/test2/cpu/control/rst_sync"
"/test2/cpu/control/ack_i"
"/test2/cpu/control/ack_sync"

i'm usng same synchronizer, for both input...
but you can see that "ack_sync" is 2 clk late than
and "rst_sync" is 1 clk late...

also
you can see the glich in "ack_sync"...

how this thing can be corrected???

please help with this issue...
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top