Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Verilog: assign register outputs using wire

Status
Not open for further replies.

Buriedcode

Full Member level 6
Full Member level 6
Joined
May 6, 2004
Messages
357
Helped
43
Reputation
86
Reaction score
7
Trophy points
1,298
Location
London
Activity points
8,887
Hi,

I'm relatively new to verilog (avoided VHDL like the plague) but decided to use it for a complicated logic block I need for an LCD controller - it takes in 3 single bit inputs, R,G and B, and outputs three bytes sequentially. It reads in the RGB inputs 8 times, and outputs the three 3 bytes. ( 8 X 3 inputs = 24 bits. = 3 bytes).
A byte is sent out after 4 input reads, another after another 4 input reads, then, sometime later, the last byte.

This is essentially a state machine, so I've included a 4-bit counter to count from 0 to 11 (12 states). I've uesd a 'case' statement for each of the states 0-7, but I only need one thing done in states 8-11, therefore, I picked state 10. In the states I do not need anything done, I have simpley put

b[15:0] <= b[15:0]; // Hoping the fitter will realise this doesn't actually do anything.

I'm using 16 registers as temporary storage, plus 4 for the counter, plus inevitably some more for combinatorial. However, Quartus has generated something with 28 macrocells. I understand that its not an easy problem, but I fear these extra macrocells are generated because of my poor coding style, and so, I've provided the module, with hope that someone can point out flaws. I am not asking for confirmation of functionality (it works exactly as I intended), but more for optimization as every macrocell counts in this design:

Code:
module LCDcolfmat(clkin,cntout,dout,R,G,B,reset);

	reg [15:0] b;
	reg [3:0] counter;
	wire cntmax = (counter==11);
	
	input clkin;
	input reset;
	input R;
	input G;
	input B;
	output [7:0] dout;
	output [3:0] cntout;
	
	wire [7:0] dout;
	
	always @(posedge clkin)
	begin
		if(reset|cntmax)
		counter <= 0;
		else
		counter <= counter + 1;
	end
	
	always @(posedge clkin)
	case(counter)
		0: b[15:13] <= {R,G,B}; // fill up buffer with pixel 1
		1: b[12:10] <= {R,G,B}; // pixel 2
		2: b[9:7] <= {R,G,B};    // pixel 3
		3: b[6:4] <= {R,G,B};    // pixel 4 - b[15:8] read out here from external latch
		4: begin
			b[15:12] <= b[7:4]; // move lower byte to upper, last 4 bits don't care.
			b[11:9] <= {R,G,B}; // pixel 5
		   end
		5: b[8:6] <= {R,G,B}; // pixel 6
		6: b[5:3] <= {R,G,B}; // pixel 7
		7: b[2:0] <= {R,G,B}; // pixel 8 - b[15:8] read out here from external latch
		8: b[15:0] <= b[15:0]; // do nothing
		9: b[15:0] <= b[15:0]; // do nothing
		10: b[15:8] <= b[7:0]; // send lower byte of buffer to upper byte, can this be optimsed?
		11: b[15:0] <= b[15:0]; // do nothing  - b[15:8] read out here from external latch
	endcase
	
	assign dout = b[15:8];
	assign cntout = counter;
	
	endmodule

The counter is sent to an output purely for simulation so I can see what value the counter is at a particular time. It is quite a complicated idea but its the only way I can format the input data for a colour-STN LCD.

Does adding 'assign dout = b[15:8]' add more macrocells? Or perhaps using cntout as an output instead of explicitly using 'counter' as an output? You guys with FPGA's probably have the resources to not bother, but this is a 64 macrocell CPLD, so I'm tryin my best to make this as efficient as possible :)

Thanks, Buriedcode
 

Have a play with the synthesis translate off directive around the cntout assignment. You say its only for simulation, so to avoid any implementation on a real CPLD put the directives around it:

Code:
module LCDcolfmat(clkin,

// synthesis translate_off
cntout,
// synthesis translate_on

dout,R,G,B,reset);


// synthesis translate_off
assign cntout = counter;
// synthesis translate_on

Im a big fat FPGA person, but given its only a wire, I dont see why it should take anything more than routing resource (and pins!). The above removal should get rid.
Any reason you need to route it out to the top anyway for simulation? simulators can monitor internal signals. You shouldnt need the output at all (and should just be able to see counter).

PS. Im a VHDL person, but the directive is the same for both languages.
 

Hi there, thanks for the reply :)

I tried the '// synthesis translate_off' as suggested, and interestingly... Quartus generated *more* macrocells. From 28, to 32 this time. As its an extra 4, I'm guessing that the it somehow adds output registers (or just uses macrocells) to the counter.

I freely admit I am out of my depth here. I am getting along well with verilog (used to do a lot of ABEL, and still mostly use schematic entry because I'm comfortable doing everything in pure logic) bnut the software/fitters, that I use are slowly melting my brain. What makes matters worse is, I have Altera, Xilinx, *and* Lattice CPLD's to work with, all used for old projects for work, so I have to be familiar with all three. Currently using the Altera MAX7000S series, which are old.

Interestingly, using the above verilog module, Quartus fitted using 32-macrocells for the Altera CPLD - but Lattice only used 20 for its device, which is much older. 12 macrocells in a 64-mac device is a good 20% of the resources. I would think this is purely the difference in architecture between the two devices - but both devices have very similar structure (!!). So it seems either its my implementation which Quartus isn't too happy with, or I haven't opitmized the settings in the software properly.

Whilst this *is* for a work project, its much more of a learning experience.

Well, I'll play around with that translate directive around the outputs, and see if it can reduce the resource usage. Thanks for the input. Now I have to try that with ispLEVER (lattice) and ISEwebpack (Xilinx).... advice to anyone else reading this: stick to one manufacturer to save confusion :(
 

if reduction of pld resources is the main goal, try such
approach, quartus reported slightly less usage then your numbers;
I believe the idea is quite simple, if not I can add an explanation;

Code:
module lcd
(
  input        clk,
  input        reset,
  input        R, G, B,
  output [7:0] dout
);    

reg [11:0] rgb;  // shift-in register
                 // rgb[8:0] hold r,g,b data;  rgb[11:0] control

 always @(posedge clk)
   if ( reset ) rgb <= 12'h1;
   else         
     if ( rgb[11] )   //last byte of three ready
          rgb <= {5'h0,1'b1,rgb[2:0],R,G,B};
     else
       if ( rgb[10] ) //second byte ready
          rgb <= {6'h0,1'b1,rgb[1:0],R,G,B};
     else
       if ( rgb[9] )  //first byte ready
          rgb <= {7'h0, 1'b1,rgb[0], R,G,B};
     else             // shifting data in
          rgb <= {rgb[8:0],R,G,B};

 assign dout = rgb[11] ? rgb[10:3] :
               rgb[10] ? rgb[ 9:2] :
                         rgb[ 8:1]; 
endmodule
---
have fun, J.A
 
Last edited:

Comparing macrocells utilization without looking at the respective logic capabilties, e.g. number of available terms and input connectivity is like compare apples to oranges. One thing is to reduce the macrocell requirements by finding an optimal problem description, as discussed by j_andr, the other point is to find out which logic family is best suited for the application.
 

Hi guys,

Well, upon your suggestions I tried various incarnations of the idea, eventually merging this module with a timing generation module and ohping the compiler/fitter would manage to work out the optimisations better than I can.


Turns out it was optimised for 'speed' rather than 'area'. With that changed the max clock dropped form 120MHz, to 60 - which is fine since I'm looking at 16-24MHz :) Macrocells went from 32, down to 23 (!!) so I'm not sure if that means tQuartus is very good at optimsing for speed, or very good for area. Either way, its 3 more than lattice. I relaise comparing different compilers is meaningless because its all down to the device used - and different manufacturers have differen macrocell structures. But comparing the two, they looked painfully similar in terms of register configuration and product-terms.

Anyways, seems the software I use (Quartus, ispLEVER, and ISEwebpack) are all pretty sensitive to verilog coding style. Example, clocking an 8-bit counter from a combinatorial output, or another register adds 4 macrocells to the design. But clcking it form the main clock (which is hardware routed throughout the chip as its global) and using 'if <something>' doesn't increase the resource usage at all. I imagine thats the input AND array kicking in.

All in all, very steep learning curve for me over the past few days but I'm extremely impressed with the versatility of verilog. As I said, I started with ABEL, which is simple and *can* be used for complicated designs, but verilog seems much more modular (even modular within a single module..).

As for the actual application. Currently using 53 MC's for a 320x240 256 colour STN display controller with 1MBit ram and no write buffer. With a 128-MC device, that leaves plenty of space for the write buffer (data and address), as well as a couple of configuration registers - something I didn't think a small CPLD could do as its really FPGA territory. Once my website is up, I'll post the designs on there, If anything might be helpful to those who want to see just what small CPLD's can do instead of the usual 'counters and buffers'.

j_andr.

Thanks for the code! I'll pop it in Quartus tomorrow and see what happens. I believe you've gone about it differently than myself, perhaps with fewer resources! I shall post my modified version (without all the timing) and we can compared speed/resource usage.

Interesting that there is very little info on controlling colour STN displays. I guess TFT's with their faster response, better contrast and *much* easier driving characteristics took over some years ago. At the bottom end, theres monochrome STN, at the top, theres 24-bit TFT's. The middle ground generally isn't covered by DIY folk.

Thanks agian people.

Buriedcode
 

Anyways, seems the software I use (Quartus, ispLEVER, and ISEwebpack) are all pretty sensitive to verilog coding style.

I won't deny, that coding style matters in some cases. It may be even the case, that a design compiler has problems to find an obvious optimal solution for a problem, although it's no so likely with very simple CPLD structures.

What you report sounds like describing actually different hardware rather than just a matter of coding style. If your code commands a particular hardware strcuture, the compiler hasn't the option to ignore it.

I think the consequence is to look very sharp at the relation between behavioral description and synthesized hardware, which is basically following well defined rules.
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top