Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Video overlay in VHDL

Status
Not open for further replies.

filip.amator

Full Member level 3
Joined
Apr 30, 2017
Messages
176
Helped
35
Reputation
70
Reaction score
34
Trophy points
28
Activity points
1,047
Hi All!

This is my first post on this forum ;)
In my project, I want to overlay at the incoming video stream with a content stored in ROM memory (or dual-port RAM). The picture should be passed throught this piece of VHDL code to the another overlay unit or vga memory. The input data are encoded in very simple way: each 4 bit value at given address coresponds to one pixel. Although I got at the video monitor expected picture (background with an overlay in proper place) I see spourious pixels flashing within overlayed area. And now the question: Do you see and mistakes/errors in the code?
I made simulations in ModelSim but everything looks fine.
Project done in Quartus Prime Lite Edition, C5G board from Terasic with HDMI output.




Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.numeric_std.all;
 
entity overlay is
    generic 
    (
        HRES        : natural := 640;       -- size of the screen
        VRES        : natural := 480;
        
        SIZE_X  : natural := 100;       -- coordiantes of top left corner of overlay
        SIZE_Y  : natural := 100;
        OVER_POS_X      : natural := 100;   -- size of the overlay
        OVER_POS_Y      : natural := 100
    );
 
 
    port
    (
        i_clk       : in std_logic;                                 -- pixel clock
        i_clk_en        : in std_logic;
        i_reset_n   : in std_logic;
 
        i_addr      : in std_logic_vector(18 downto 0);         -- pixer address: address 0 coresponds to column 0, row 0; address 307199 to column 639, row 479 
        i_data      : in std_logic_vector(3 downto 0);          -- 4 bits per pixel, 16 colours
        
        o_addr      : out std_logic_vector(18 downto 0);        -- output stream passed to another overlay unit or video memory
        o_data      : out std_logic_vector(3 downto 0)
 
    );
end overlay;
 
 
architecture behavioral of overlay is
 
 
-- rom memory with stored 100x100 bitmap to be placed on the top of incoming picure.
COMPONENT over1 IS
    PORT
    (
        address     : IN STD_LOGIC_VECTOR (13 DOWNTO 0);
        clock           : IN STD_LOGIC  := '1';
        q               : OUT STD_LOGIC_VECTOR (3 DOWNTO 0)
    );
END COMPONENT over1;
 
                        
            
            
signal s_addr       : std_logic_vector(18 DOWNTO 0);
signal s_data       : std_logic_vector(3 DOWNTO 0);
 
 
signal s_posx       : integer range 0 to HRES-1 := 0;       -- pixel coordinates
signal s_posy       : integer range 0 to VRES-1 := 0;       -- pixel coordinates
 
 
signal s_overaddr : STD_LOGIC_VECTOR (13 DOWNTO 0);     -- address of the pixel to be repleaced
signal s_overdata       : std_logic_vector(3 DOWNTO 0);
 
 
begin
 
 
-- no flipflops at the output of the ROM
over1ram : over1 port map
(
    address => s_overaddr,
    clock => i_clk,
    q => s_overdata
);
 
    
    s_posx <= to_integer(unsigned(i_addr) mod HRES);
    s_posy <= to_integer((unsigned(i_addr) - (unsigned(i_addr) mod HRES))/HRES);
    
    -- outside of overlay area this value might be incorrect. 
    s_overaddr <= std_logic_vector(to_unsigned(s_posx - OVER_POS_X + SIZE_X*(s_posy - OVER_POS_Y),s_overaddr'length));
    
    
    process (i_clk,i_reset_n)
    begin
        if (i_reset_n='0') then
            null;
        else
            if (i_clk='1' and i_clk'event and i_clk_en='1') then
                    for i in 0 to 1 loop
                        case i is
                            
                            when 0 =>
                                        
                                if ((s_posx > OVER_POS_X-1) and (s_posx < OVER_POS_X+SIZE_X) and (s_posy > OVER_POS_Y-1) and (s_posy < OVER_POS_Y+SIZE_Y)) then
                                    null;
                                else    
                                    null;
                                end if;                                         
                            
                                    s_data <= i_data;
                                    s_addr <= i_addr;                                   
                            
                                                        
                            when 1 => null;
 
                                if ((s_posx > OVER_POS_X) and (s_posx < OVER_POS_X+SIZE_X+1) and (s_posy > OVER_POS_Y-1) and (s_posy < OVER_POS_Y+SIZE_Y)) then
                                    -- o_data <= s_data when s_overdata="0000" else s_overdata;
                                    
                                    if(s_overdata="0000") then
                                            o_data <= s_data;
                                    else 
                                            o_data <= s_overdata;
                                    end if;
                                    
                                    o_addr <= s_addr;
                                else    
                                    o_data <= s_data;
                                    o_addr <= s_addr;
                                end if;                                         
                            
                                                                            
                            when others => null;
                        
                        
                        end case;
                    end loop;
            end if;
        
        end if;
    end process;
 
 
 
    
end architecture;

 

Possibly timing errors. This code doesn't appear to have any concerns for area/performance.

There may also be pipeline issues, perhaps based on when i_clk comes in relative to i_clk_en. This code doesn't appear to have any concerns for sample delay vs cycle delay correctness.

There are also several oddities in the coding style. For example, I would like an explanation of what you think the for-loop construct is needed. same with the "if (x) then null; else null end if;" structure
 

I have forgotten to mention that only one column of overlayed bitmap is affected. Please take a look at the photographs of my monitor with the generated picture.

This is a magnification of the overlayed bitmap (note the pixels randomly spread right before 29th column):

overlay.jpg

It seems that only 29th column is affected, pixels in that column have got random 'y' coordinate.

The whole screen:

screen.jpg

- - - Updated - - -

Signal i_clk_en is tied permanently to '1' -- I suppose it might be useful in future but in my test bench and in the code it is fixed to '1'. The content of the first case with for-loop was used in previous version, I forgot to delete it.
 
Last edited:

offset of 100 + 28 = 128. This makes me think there is an error that can occur with the 7th bit of some bus. possibly a timing error, although it sounds like the clock rate is very low.

--edit, s_posx may be the issue.
 
Last edited:

Odd coding, as already pointed out.
You mention no ff on the rom output. Why?
You are using mod -this inferred a multiplier and has probably affected timing. Did you perform timing analysis on the design?
 

What is the clock rate?

Hopefully this infers a multiply for the mod and division. This is less of an issue compared to the number of sequential math operations. Likewise, if the ROM has no registers, it is a fairly large distributed memory (at best). Perhaps global optimizations could convert it into a block ram, but I'm wouldn't count on it.


In this case, the design itself could be done with no multipliers IF there is an assumption that the input address increments by 1 per cycle.

The mod operation for s_posy is not needed.

From an architecture perspective, a better design would have one stage that converts the i_addr to x,y positions. Then all processing modules can use the x,y positions as input/outputs. A final module can convert the address back. I also prefer having input/output valid signals vs a clock enable. This makes it easier to determine pipeline correctness.
 
Hi All,

As vGoodtimes advised I divided my code into four different stages and now everything (almost) works fine! The clock rate is 25.175 MHz, a pixel clock for 640x480.
Could you tell me how to calculate s_posy without using mod? I guess it uses hw multipliers which are a precious resource.
 

Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top