Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

[moved] Interal Tri-States In FPGA

Status
Not open for further replies.

asdf44

Advanced Member level 4
Joined
Feb 15, 2014
Messages
1,000
Helped
355
Reputation
710
Reaction score
344
Trophy points
1,363
Activity points
9,716
What's the conventional wisdom on this these days?

Of course I know there are no internal Z's, instead I want the tools to infer a mux.

Specifically I'd like to implement a large register file by sending around a shared memory bus and hanging 100 or so modules off it that each implement one register. The models will 'Z' the shared data_out bus when their address isn't selected.

From a code point of view it's compact and flexible - I'll have different modules for read-only, read/write, latches with clear functionality etc and new registers can be plugged in or removed with the bare minimum of modifications.

Although I've used this on a small scale in Xilinx ISE I've shied away from it on past major designs citing the general wisdom that tri-states are to be avoided. But I'd like to re-evaluate that for an upcoming design.

EDIT: Sorry, meant to put in FPGA section.
 
Last edited:

I would suggest not doing this.

Better would be to share a OR'd bus, zero out any bus that is not selected to drive the bus. Then the multiplexer conversion of these 100's of buses won't require a huge logic cone to produce the multiplexer. Instead you'll end up with a large OR network, which is a lot more efficient for timing as it will require less levels of logic to implement.

I steer away from creating code that does not represent the hardware you want to implement. In this case tri-states can not be implemented in any current FPGA devices, so the code has to be transformed into something that can be implemented, thereby implementing something different than the original code.
 

If you want a mux, write a mux.
There have been no internal tri states for 10+ years.
 

I would suggest not doing this.

Better would be to share a OR'd bus, zero out any bus that is not selected to drive the bus. Then the multiplexer conversion of these 100's of buses won't require a huge logic cone to produce the multiplexer. Instead you'll end up with a large OR network, which is a lot more efficient for timing as it will require less levels of logic to implement.

I steer away from creating code that does not represent the hardware you want to implement. In this case tri-states can not be implemented in any current FPGA devices, so the code has to be transformed into something that can be implemented, thereby implementing something different than the original code.

First, I'm missing what the advantage of an OR'd bus is versus a multiplexed implementation. I'm picturing that an ORd bus still needs to funnel through layers of LUT's too (maybe its the carry in?).


So my desire was to avoid explicitly coding the mux because it adds more places I need to touch to add/delete/modify a register, its functionality or its name etc.

So what were you picturing for an implementation of the OR'd bus? Connect my data_out ports to an array and then use a for loop to perform the OR operation?

That seems like a solution.
 

IIRC, Xilinx's tools will do this in some cases, but not all cases. I think you can do it within an entity if the structure ensures at most one net is non-Z. I think Z then resolves to '0' in the HW implementation for the case that all are 'Z'.



For register interfaces, everyone has their preference. Some of it is also based on lack of experience with alternatives and specific design concerns. For example, if latency isn't an issue you can pipeline more effectively.

IIRC, the OR based version might be smaller and higher performance as all actual muxes are local to modules. The key point is that all modules must output 0 when not selected, otherwise a read will be corrupted.


I've seen three approaches in the past. The first did all register accesses in a single file and then had all signals routed to it. I didn't like this as it meant complicated routing of signals through the hierarchy as well as a giant file that always had merge conflicts. The second was what you say -- a bus is passed around and a large number of register/rams modules are created. The implementation was a bit annoying due to the features needed. The last one passed an address bus around, but added a "chip-select". Each module would do full/partial address decoding. Register/Rams were mostly inferred in a single process within each module.

There was a ridiculous address decoder I wrote to make the chip-select decoding easier and safer as it would do error checking and optimization. I also had some other procedures that would create a register map from the synthesis report, at least for any code I wrote. The register map would be stored in the .bit file and then read by my python script, allowing me to script my tests in python with named registers/rams/bitfields. It also made on the fly debugging easier as you always had the register map. (and the build date, git revision, build computer, and timing score)
 

Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top