Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Binning + Pipeline, How to do it, please?

Status
Not open for further replies.

flote21

Advanced Member level 1
Joined
Jan 22, 2014
Messages
411
Helped
1
Reputation
2
Reaction score
3
Trophy points
1,298
Activity points
5,595
Hello guys!

I need to make a binning of the pixel output of a sensor. The binning process is based on adding 5 pixels. But the sensor is outputing 4 pixels/clk (See picture attached)

PixOut.jpg

Any idea about how to do it doing pipeline?

Thanks!
 

Make a pipeline with some adders in it. Should output 1 value per clock, latency 3 clocks.
 

Doing something like this would convert the 4 pixels to 5 pixels, which you can then add together.


Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
reg [4*8-1:0]   pixels;
  reg [4*8-1:0]   save_pixels;
  reg [2:0]       clk_cnt;
  reg [5*8-1:0]   five_pixels;
 
  //...code to generate clk_cnt, save_pixels, and pixels
 
  // pixels need to be selected according to the following table
  //               clocks
  // 0  1  2  3  -  0
  // 0           -  1
  //
  //    1  2  3
  // 0  1        -  2
  //
  //       2  3 
  // 0  1  2     -  3
  //
  //          3
  // 0  1  2  3  -  4
  always @ (posedge clk) begin
    case (clk_cnt)
      0       : five_pixels <= five_pixels;
      1       : five_pixels <= {save_pixels[0 +:4*8], pixels[32-(1*8) +:1*8]};
      2       : five_pixels <= {save_pixels[0 +:3*8], pixels[32-(2*8) +:2*8]};
      3       : five_pixels <= {save_pixels[0 +:2*8], pixels[32-(3*8) +:3*8]};
      4       : five_pixels <= {save_pixels[0 +:1*8], pixels[32-(4*8) +:4*8]};
      default : five_pixels <= {5*8{1'bx}};
    endcase
  end

 

I am working with VHDL coding. I am not familiar with verilog

But anyway, in my mind was doing something like in the table attached.

table.jpg

- - - Updated - - -

But the problem is that with that solution I can't ouput data every clock after the pipeline delay... I need one solution to ouput data every clock without "deadtimes" in the output...

Any idea?
 

But the problem is that with that solution I can't ouput data every clock after the pipeline delay... I need one solution to ouput data every clock without "deadtimes" in the output...Any idea?
Do you even understand what you just asked?

How are you supposed to output data on every clock when the binning involves adding 5 pixels together, when you only get 4 pixels every clock? Unless you were supposed to do some sort of running sum...(i.e. d1+d2+d3+d4+d5, d2+d3+d4+d5+d6, d3+d4+d5+d6+d7, etc) then you could compute it on every clock.

If your original algorithm is correct and you have to output bin data on every clock then you will have to output on a clock that is 4/5 the input clock frequency and use a FIFO to ensure you don't under/overflow the binned pixel output.

Regards

- - - Updated - - -

BTW, the implementation you proposed will use a lot more adders and will also be much slower as your implementation requires that the 3 additions in the first case are done in 1 clock cycle. If you implemented what I showed you and follow it by a pipelined adder you would use fewer adders and it could run at a much higher frequency.
 

Why I need to use a FIFO to ensure you don't under/overflow the binned pixel output?? I can binning the incomming pixels on the fly! but the problem of this solution is that I have output freq limitation. The pixels are incomming at 148.5 MHz and I need to output binning pixels at maximum 500Hz. But it does not work. The maximum frequency that I can set up is 350 Hz. And I think that my pipeline algorithm is limiting in any way the output frequency.

Do you even understand what you just asked?

How are you supposed to output data on every clock when the binning involves adding 5 pixels together, when you only get 4 pixels every clock? Unless you were supposed to do some sort of running sum...(i.e. d1+d2+d3+d4+d5, d2+d3+d4+d5+d6, d3+d4+d5+d6+d7, etc) then you could compute it on every clock.

If your original algorithm is correct and you have to output bin data on every clock then you will have to output on a clock that is 4/5 the input clock frequency and use a FIFO to ensure you don't under/overflow the binned pixel output.

Regards

- - - Updated - - -

BTW, the implementation you proposed will use a lot more adders and will also be much slower as your implementation requires that the 3 additions in the first case are done in 1 clock cycle. If you implemented what I showed you and follow it by a pipelined adder you would use fewer adders and it could run at a much higher frequency.
 

Why I need to use a FIFO to ensure you don't under/overflow the binned pixel output??
Only if you are sticking to your requirement that you output binned pixels on each clock...the aggregate rate for output bins is 4/5 your input clock that sends you 4 pixels per clock.

I can binning the incomming pixels on the fly! but the problem of this solution is that I have output freq limitation. The pixels are incomming at 148.5 MHz and I need to output binning pixels at maximum 500Hz. But it does not work. The maximum frequency that I can set up is 350 Hz. And I think that my pipeline algorithm is limiting in any way the output frequency.
This doesn't make any sense based on your original request.

Originally you state that you receive four pixels per clock (at 148.5 MHz?) so you need to output four bins (five summed pixels) over five clock cycles (i.e. an aggregate rate of 148.5*4/5 = 118.8 MHz) How do you end up with 350 Hz or 500 Hz?

- - - Updated - - -

If you've done enough to see that your design outputs binning pixels at 350 Hz maximum frequency maybe you should just post the VHDL code (which I expect won't be commented properly). Then we won't have to keep going around in circles with you not telling us exactly what you are trying to do.

- - - Updated - - -

I just realized you are still working on this project:
https://www.edaboard.com/threads/318579/#post1362166
 
Last edited:

Yes you are right, I am still working in that project.

Unfortunately, I can't post the complete project because it is a company confidence. However I can post the pipeline algorthm that I am using to adding 5 pixels and 10 pixels (See below).

I have to say that the input frames are composed by 1920x219 pixels.


Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
------------------------
entity BINNING_4PIX is
------------------------
  generic (
    DATA_BITS : positive := 12    -- Data Bus Width
  );
  port (
    BIN_MODE    : in  std_logic_vector(2 downto 0);             -- Binning Mode : "000" = 6x2, "001" = 3x1, "111" => no Binning (for Full Frame)
    -- Video Input Parallel Interface, sync'ed on VID_I_CLK
    VID_I_CLK   : in  std_logic;                                -- Video  Input Clock
    VID_I_RST   : in  std_logic;                                -- Video  Input Reset (Async Active High)
    VID_I_SOI   : in  std_logic;                                -- Video  Input Start Of Image
    VID_I_SOL   : in  std_logic;                                -- Video  Input Start Of Line
    VID_I_EOI   : in  std_logic;                                -- Video  Input   End Of Image
    VID_I_DARK  : in  std_logic;                                -- Video  Input Dark Image '1', Bright '0'
    VID_I_DAV   : in  std_logic;                                -- Video  Input Pixel Data Valid Flag
    VID_I_DATA  : in  std_logic_vector(4*DATA_BITS-1 downto 0); -- Video  Input Pixel Data : 4 pixels
    -- Video Output Parallel Interface, sync'ed on VID_O_CLK
    VID_O_CLK   : in  std_logic;                                -- Video Output Clock
    VID_O_RST   : in  std_logic;                                -- Video Output Reset (Async Active High)
    VID_O_XSIZE : in  std_logic_vector(11 downto 0);            -- Video Output X Size
    VID_O_YSIZE : in  std_logic_vector(11 downto 0);            -- Video Output Y Size
    VID_O_SOI   : out std_logic;                                -- Video Output Start of Image
    VID_O_SOL   : out std_logic;                                -- Video Output Start of Line
    VID_O_EOI   : out std_logic;                                -- Video Output   End of Image
    VID_O_DARK  : out std_logic;                                -- Video Output Dark Image '1', Bright '0'
    VID_O_DAV   : out std_logic;                                -- Video Output Pixels Data Valid Flag
    VID_O_DATA  : out std_logic_vector(63 downto 0);            -- Video Output Pixels Data (4 pixels)
    VID_O_XCNT  : out std_logic_vector(11 downto 0);            -- Video Output Pixels Counter (0 for 1st pixel)
    VID_O_YCNT  : out std_logic_vector(11 downto 0)             -- Video Output  Lines Counter (0 for 1st line )
  );
-------------------------------------
end entity BINNING_4PIX;
-------------------------------------
 
-------------------------------------
architecture RTL of BINNING_4PIX is
-------------------------------------
-------
begin
-------
 
  -- ---------------------------------
  --  Horizontal Binning process
  -- ---------------------------------
  -- only done for Binning modes 5x1 and 10x
  BINNING_H_proc : process(VID_I_CLK, VID_I_RST)
  begin
      BIN_MODE_l  <= (others => '0');
      BIN_ADD1    <= (others => '0');
      BIN_ADD2    <= (others => '0');
      BIN_ADD3    <= (others => '0');
      BIN_ADD4    <= (others => '0');
      BIN_LINE    <= '0';
      BIN_DAV     <= '0';
      BIN_DATA    <= (others => '0');
      BIN_DAV2    <= '0';
      BIN_DATA2   <= (others => '0');
      BIN_SEL     <= (others => '0');
      BIN_SELr    <= (others => '0');
      VID_F_SOI   <= '0';
      VID_F_DARK  <= '0';
      VID_F_SOL   <= '0';
elsif rising_edge(VID_I_CLK) then
 
      -- Latch Dropping Factors + Reset Counters
      VID_F_SOI  <= '0';
      if VID_I_SOI = '1' then
        BIN_MODE_l <= BIN_MODEs;  -- Latching Binning Mode  
        VID_F_SOI  <= '1';
        VID_F_DARK <= VID_I_DARK; -- Latch Dark Info
        BIN_LINE   <= '1';  -- will be '0' for Even Lines, '1' for Odd lines
      end if;
 
      -- Valid Lines Management
      VID_F_SOL <= '0';
 
      if VID_I_SOL = '1' then   -- New line
        BIN_LINE  <= not BIN_LINE;
        BIN_ADD1  <= (others => '0');
        BIN_ADD2  <= (others => '0');
        BIN_ADD3  <= (others => '0');
        BIN_ADD4  <= (others => '0');
       if BIN_MODE_l = "010" then -- Binning 5x1
          BIN_SELr  <= (others => '0');
          BIN_SEL   <= (others => '0');
          VID_F_SOL <= '1';           -- Output every line          
        elsif BIN_MODE_l = "011" then    -- Binning 10x2
          BIN_SEL   <= (others =>'0');
          VID_F_SOL <= not BIN_LINE;  -- Output 1 line out of 2          
        end if;
      end if;
-- Pipeline Stage 1
      -- Valid Pixels Management :
      -- Receiving 4 pixels on every clock cycle
      VID_I_DAVr <= '0';
      if VID_I_DAV = '1' then  -- New "4pixels" 
        VID_I_DAVr <= '1';  -- for the pipeline
        BIN_SEL    <= BIN_SEL + 1;
        -- Binning 5x1
        if BIN_MODE_l = "010" then  
          if BIN_SELr = 3 then 
            BIN_SEL <= (others => '0');
          end if;
          if BIN_SEL = 0 then
            BIN_ADD1 <= resize(unsigned(VID_I_DATA(11 downto 00)), BIN_ADD1'length) + unsigned(VID_I_DATA(23 downto 12));
                BIN_ADD2 <= resize(unsigned(VID_I_DATA(35 downto 24)), BIN_ADD2'length) + unsigned(VID_I_DATA(47 downto 36));
          elsif BIN_SEL = 1 then
            BIN_ADD3 <= resize(unsigned(VID_I_DATA(23 downto 12)), BIN_ADD3'length) + unsigned(VID_I_DATA(35 downto 24));
            BIN_ADD4 <= resize(unsigned(VID_I_DATA(47 downto 36)), BIN_ADD4'length);
          elsif BIN_SEL = 2 then
            BIN_ADD1 <= resize(unsigned(VID_I_DATA(35 downto 24)), BIN_ADD1'length) + unsigned(VID_I_DATA(47 downto 36));
          elsif BIN_SEL = 3 then
            BIN_ADD2 <= resize(unsigned(VID_I_DATA(47 downto 36)), BIN_ADD2'length);                      
          end if; 
        -- Binning 10x2
        elsif BIN_MODE_l = "011" then  
          if BIN_SELr = 3 then 
            BIN_SEL <= (others => '0');
          end if;
          if BIN_SEL = 0 then
                BIN_ADD1 <= resize(unsigned(VID_I_DATA(11 downto 00)), BIN_ADD1'length) + unsigned(VID_I_DATA(23 downto 12));
                BIN_ADD2 <= resize(unsigned(VID_I_DATA(35 downto 24)), BIN_ADD2'length) + unsigned(VID_I_DATA(47 downto 36));
          elsif BIN_SEL = 1 then
            BIN_ADD3 <= resize(unsigned(VID_I_DATA(11 downto 00)), BIN_ADD3'length) + unsigned(VID_I_DATA(23 downto 12));
                BIN_ADD4 <= resize(unsigned(VID_I_DATA(35 downto 24)), BIN_ADD4'length) + unsigned(VID_I_DATA(47 downto 36));
             elsif BIN_SEL = 2 then
                BIN_ADD1 <= resize(unsigned(VID_I_DATA(35 downto 24)), BIN_ADD1'length) + unsigned(VID_I_DATA(47 downto 36));                                                                                 
          elsif BIN_SEL = 3 then            
            BIN_ADD3 <= resize(unsigned(VID_I_DATA(11 downto 00)), BIN_ADD3'length) + unsigned(VID_I_DATA(23 downto 12));
                BIN_ADD4 <= resize(unsigned(VID_I_DATA(35 downto 24)), BIN_ADD4'length) + unsigned(VID_I_DATA(47 downto 36));                        
             end if;
        end if;       
      end if;
 
      -- Pipeline Stage 2
      BIN_DAV  <= '0';
      BIN_DAV2 <= '0';
      if VID_I_DAVr = '1' then
       if BIN_MODE_l = "010" then
          if BIN_SEL = 3 then
            BIN_SELr <= BIN_SEL;
          else
            BIN_SELr <= "000";
          end if;
          if BIN_SEL = 1 then
            -- Valid Output
            BIN_DAV  <= '1';
            BIN_DATA <= std_logic_vector(resize(BIN_ADD1, BIN_DATA'length) + BIN_ADD2 +
                                                resize(unsigned(VID_I_DATA(11 downto 00)), BIN_DATA'length));
          elsif BIN_SEL = 2 then                                                                              
            -- Valid Output
            BIN_DAV  <= '1';
            BIN_DATA <= std_logic_vector(resize(BIN_ADD3, BIN_DATA'length) + BIN_ADD4 +
                                         resize(unsigned(VID_I_DATA(11 downto 00)), BIN_DATA'length) +
                                                resize(unsigned(VID_I_DATA(23 downto 12)), BIN_DATA'length));
            
          elsif BIN_SEL = 3 then                                                                              
            -- Valid Output
            BIN_DAV  <= '1';
            BIN_DATA <= std_logic_vector(resize(BIN_ADD1, BIN_DATA'length) + 
                                         resize(unsigned(VID_I_DATA(11 downto 00)), BIN_DATA'length) +
                                                resize(unsigned(VID_I_DATA(23 downto 12)), BIN_DATA'length) + 
                                                resize(unsigned(VID_I_DATA(35 downto 24)), BIN_DATA'length));
            
          elsif BIN_SELr = 3 then                                                                              
            -- Valid Output
            BIN_DAV  <= '1';
            BIN_DATA <= std_logic_vector(resize(BIN_ADD2, BIN_DATA'length) + 
                                         resize(unsigned(VID_I_DATA(11 downto 00)), BIN_DATA'length) +
                                                      resize(unsigned(VID_I_DATA(23 downto 12)), BIN_DATA'length) + 
                                                resize(unsigned(VID_I_DATA(35 downto 24)), BIN_DATA'length) +
                                                resize(unsigned(VID_I_DATA(47 downto 36)), BIN_DATA'length));
 
          end if;                                                                                                         
        -- Binning 10x2
        elsif BIN_MODE_l = "011" then
          if BIN_SEL = 3 then
            BIN_SELr <= BIN_SEL;
          else
            BIN_SELr <= "000";
          end if;
 
          if BIN_SEL = 2 then
            BIN_DAV  <= '1';
            BIN_DATA <= std_logic_vector(resize(BIN_ADD1, BIN_DATA'length) + BIN_ADD2 + BIN_ADD3 + BIN_ADD4 + 
                                                      resize(unsigned(VID_I_DATA(11 downto 00)), BIN_DATA'length) +
                                                    resize(unsigned(VID_I_DATA(23 downto 12)), BIN_DATA'length));                           
          elsif BIN_SELr = 3 then
            BIN_DAV  <= '1';
            BIN_DATA <= std_logic_vector(resize(BIN_ADD1, BIN_DATA'length) + BIN_ADD3 + BIN_ADD4 + 
                                         resize(unsigned(VID_I_DATA(11 downto 00)), BIN_DATA'length) +
                                                      resize(unsigned(VID_I_DATA(23 downto 12)), BIN_DATA'length) + 
                                                      resize(unsigned(VID_I_DATA(35 downto 24)), BIN_DATA'length) +
                                                      resize(unsigned(VID_I_DATA(47 downto 36)), BIN_DATA'length));
             end if;
       end if;       
     end if;
      
   end if;
  end process BINNING_H_proc;
 
  -- ---------------------------------
  --  Vertical Binning process
  -- ---------------------------------
  BINNING_V_proc : process(VID_I_CLK, VID_I_RST)
  begin
    if VID_I_RST = '1' then
      BIN_PACK_SEL   <= '0';
      BIN_PACK_WR    <= '0';
      BIN_PACK_DATA  <= (others => '0');
      VID_F_SEL      <= '0';
      VID_F_DAV      <= '0';
      VID_F_DATA     <= (others => '0');
    elsif rising_edge(VID_I_CLK) then
 
      BIN_PACK_WR   <= '0';
            
      -- Packing the 16bits pixels in 32bits 
      if BIN_DAV = '1'  then
      
          if BIN_MODE_l = "010" then
          BIN_PACK_DATA(47 downto 32) <= x"0000";  -- not used in this mode
          --
          BIN_PACK_SEL <= not BIN_PACK_SEL;
          if BIN_PACK_SEL = '0' then
            BIN_PACK_DATA(15 downto 0) <= BIN_DATA;
          else     
            BIN_PACK_WR <= '1';
            BIN_PACK_DATA(31 downto 16) <= BIN_DATA;
          end if;
        elsif BIN_MODE_l = "011" then
          BIN_PACK_DATA(47 downto 32) <= x"0000";  -- not used in this mode
          --
          BIN_PACK_SEL <= not BIN_PACK_SEL;
          if BIN_PACK_SEL = '0' then
            BIN_PACK_DATA(15 downto 0) <= BIN_DATA;
          else     
            BIN_PACK_WR <= '1';
            BIN_PACK_DATA(31 downto 16) <= BIN_DATA;
          end if;          
        end if;
      end if;
      
      VID_F_DAV <= '0';
        if BIN_MODE_l = "010" then
        if BIN_PACK_WR = '1' then
          if VID_F_SEL = '0' then 
            VID_F_SEL <= '1';
            VID_F_DATA(31 downto 00) <= BIN_PACK_DATA(31 downto 0);
          else
            VID_F_SEL <= '0';
            VID_F_DAV <= '1';
            VID_F_DATA(63 downto 32) <= BIN_PACK_DATA(31 downto 0);
          end if;          
        end if;
      -- Binning 10x2 : For Odd lines, read from Fifo and compute the Final results
      elsif BIN_MODE_l = "011" then
        if BIN_FIFO_RD = '1' then  -- New Data 
          if VID_F_SEL = '0' then 
            VID_F_SEL <= '1';
            VID_F_DATA(15 downto 00) <= std_logic_vector(resize(unsigned(BIN_FIFO_OUT(15 downto 00)), 16) + unsigned(BIN_PACK_DATA(15 downto 00)));
            VID_F_DATA(31 downto 16) <= std_logic_vector(resize(unsigned(BIN_FIFO_OUT(31 downto 16)), 16) + unsigned(BIN_PACK_DATA(31 downto 16)));
          else
            VID_F_SEL <= '0';
            VID_F_DAV <= '1';
            VID_F_DATA(47 downto 32) <= std_logic_vector(resize(unsigned(BIN_FIFO_OUT(15 downto 00)), 16) + unsigned(BIN_PACK_DATA(15 downto 00)));
            VID_F_DATA(63 downto 48) <= std_logic_vector(resize(unsigned(BIN_FIFO_OUT(31 downto 16)), 16) + unsigned(BIN_PACK_DATA(31 downto 16)));
          end if;
        end if;       
      end if;
 
      -- Initialize things on Start of Line
      if VID_I_SOL = '1' then
        BIN_PACK_SEL <= '0';
      end if;
      
    end if;
  end process BINNING_V_proc;
 
  -- FIFO to store the Binning Results of Even Lines
  -- Will be read when Odd lines arrive, to compute the final results
  -- of each "couple of lines" (10x2 binning)
  -- Show Ahead Mode, Output Registered
 
-- Storing the Binning H results in FIFO for Even lines : only for Binning 10x2 
  BIN_FIFO_WR <= BIN_PACK_WR and not BIN_LINE when BIN_MODE_l = "011" else '0'; 
 
  BIN_FIFO_IN <= BIN_PACK_DATA;
  
  -- The FIFO (synchronous)
  i_BIN_FIFO : SCFIFO
    generic map (
      ADD_RAM_OUTPUT_REGISTER => "ON",
      INTENDED_DEVICE_FAMILY  => "Cyclone IV",
      LPM_NUMWORDS            => 2**BIN_FIFO_DEPTH,
      LPM_SHOWAHEAD           => "ON",
      LPM_TYPE                => "SCFIFO",
      LPM_WIDTH               => BIN_FIFO_WIDTH,
      LPM_WIDTHU              => BIN_FIFO_DEPTH,
      OVERFLOW_CHECKING       => "ON",
      UNDERFLOW_CHECKING      => "ON",
      USE_EAB                 => "ON"
    )
    port map (
      ACLR  => VID_I_RST   ,
      CLOCK => VID_I_CLK   ,
      SCLR  => '0'         ,
      WRREQ => BIN_FIFO_WR ,
      DATA  => BIN_FIFO_IN ,
      FULL  => BIN_FIFO_FUL,
      USEDW => open        ,
      EMPTY => BIN_FIFO_EMP,
      RDREQ => BIN_FIFO_RD ,
      Q     => BIN_FIFO_OUT
    );
    
    
  -- Extracting FIFO (Binning 10x2 only)
  -- when Odd Lines received and new data computed (Binning 10x2)  
  BIN_FIFO_RD <= not BIN_FIFO_EMP and BIN_PACK_WR and BIN_LINE when BIN_MODE_l = "011" else '0'; -- when Odd Lines received and new data computed (Binning 6x2 and 10x2)  
 
  -- -------------------------------------
  -- Clock Domain Crossing to VID_O_CLK
  -- -------------------------------------
    
  -- FIFO to store the 64bits Binning Results
  -- and to perform the Clock Domain Crossing to VID_O_CLK
  -- Show Ahead Mode, Output Registered
  
  -- Writing in FIFO
  FIFO_WR <= VID_F_SOI or VID_F_SOL or VID_F_DAV;
  FIFO_IN <= VID_F_SOI &  VID_F_SOL &  VID_F_DAV & VID_F_DATA & VID_F_DARK;
  
  -- Clock Domain Crossing FIFO
  -- Show Ahead Mode, Output Registered
  -- Data Width 32 bits, Fifo Depth = 2**FIFO_DEPTH
  i_O_FIFO : DCFIFO
    generic map (                               
      CLOCKS_ARE_SYNCHRONIZED => "FALSE",       
      INTENDED_DEVICE_FAMILY  => "Cyclone V",   
      LPM_NUMWORDS            => 2**FIFO_DEPTH, 
      LPM_SHOWAHEAD           => "ON",          
      LPM_TYPE                => "dcfifo",      
      LPM_WIDTH               => FIFO_WIDTH,    
      LPM_WIDTHU              => FIFO_DEPTH,    
      OVERFLOW_CHECKING       => "ON",          
      UNDERFLOW_CHECKING      => "ON",          
      USE_EAB                 => "ON",            
      WRSYNC_DELAYPIPE        => 3,             
      RDSYNC_DELAYPIPE        => 3,             
      READ_ACLR_SYNCH         => "ON",         
      WRITE_ACLR_SYNCH        => "ON"          
    )                                           
    port map (                                  
      ACLR    => VID_I_RST,                     
      WRCLK   => VID_I_CLK,                     
      WRREQ   => FIFO_WR  ,                     
      DATA    => FIFO_IN  ,                     
      WRUSEDW => open     ,                     
      WRFULL  => FIFO_FUL ,                     
      RDCLK   => VID_O_CLK,                     
      RDEMPTY => FIFO_EMP ,                    
      RDREQ   => FIFO_RD  ,                     
      Q       => FIFO_OUT ,                      
      RDUSEDW => open  
    );                  
            
  -- Checking FIFO Flags  
  process(VID_I_CLK)
  begin
    if rising_edge(VID_I_CLK) then
      assert not ( FIFO_WR = '1' and FIFO_FUL = '1' )
      report "[BINNING_4PIX] WRITE while i_O_FIFO Full !!!" severity failure;
    end if;
  end process;
  process(VID_O_CLK)
  begin
    if rising_edge(VID_O_CLK) then
      assert not ( FIFO_RD = '1' and FIFO_EMP = '1' )
      report "[BINNING_4PIX] READ while i_O_FIFO Empty !!!" severity failure;
    end if;
  end process;
 
  -- Reading the FIFO
  FIFO_RD <= not FIFO_EMP;
  
  -- Video Outputs
  VID_O_SOIi <= FIFO_RD and FIFO_OUT(67) when rising_edge(VID_O_CLK);
  VID_O_SOLi <= FIFO_RD and FIFO_OUT(66) when rising_edge(VID_O_CLK); 
  VID_O_DAVi <= FIFO_RD and FIFO_OUT(65) when rising_edge(VID_O_CLK);   
  VID_O_DATA <= FIFO_OUT(64 downto 1)    when rising_edge(VID_O_CLK); 
 
  -- -----------------------------------------------------
  --  Outputting Data on VID_O_CLK clock domain
  -- -----------------------------------------------------
  OUT_proc : process(VID_O_CLK, VID_O_RST)
  begin
    if VID_O_RST = '1' then
      VID_O_DARKi <= '0';
      VID_O_EOIi  <= '0';
      VID_O_XCNTi <= (others => '0');
      VID_O_YCNTi <= (others => '0');
    elsif rising_edge(VID_O_CLK) then
 
      -- Dark/Bright Info 
      if FIFO_RD = '1' and FIFO_OUT(67) = '1' then  -- SOI 
        VID_O_DARKi <= FIFO_OUT(0);
      end if;
 
      -- Output Pixel Counter 
      if VID_O_DAVi = '1' then
        VID_O_XCNTi <= VID_O_XCNTi + 4;  -- 2 pixels per 32bits   
      end if;
 
      -- Output Line Counter
      if VID_O_SOIi = '1' then
        VID_O_YCNTi <= (others => '1');
      elsif VID_O_SOLi = '1' then
        VID_O_XCNTi <= (others => '0');
        VID_O_YCNTi <= VID_O_YCNTi + 1;
      end if;
 
      -- End of Image
      VID_O_EOIi  <= '0';
      if VID_O_DAVi = '1' and 
         VID_O_XCNTi = unsigned(VID_O_XSIZE)-4 and 
         VID_O_YCNTi = unsigned(VID_O_YSIZE)-1 then 
        VID_O_EOIi <= '1';
      end if;
      
    end if;
  end process OUT_proc;
 
  VID_O_SOI  <= VID_O_SOIi ;
  VID_O_SOL  <= VID_O_SOLi ;
  VID_O_EOI  <= VID_O_EOIi ;
  VID_O_DARK <= VID_O_DARKi;
  VID_O_DAV  <= VID_O_DAVi ;
  VID_O_XCNT <= std_logic_vector(VID_O_XCNTi);
  VID_O_YCNT <= std_logic_vector(VID_O_YCNTi);

 

Bug number 1 - this code only works when DATA_BITS = 12, as your adders expect 12 bit data words.
On this note:
WHy all the data type conversions inline? why not do one of the following:
1. Have 4 separate unsigned Dword inputs (for the 4 pixels)
2. Separate the 4 pixels into temprary unsigned variables in the code?

Are you sure you're getting pixels at 148.5? thats the pixel clock for SDI 1080p50/60. With that you only get 1 luma/chroma pair per clock. So surely the data rate you have now is actually dvalid is high once every 4 clocks? or is there some buffering mechanism upstream that stores the whole picture and then splats it out in 1/4 of the frame time? with this, you'd actually be quite inefficient as you'd have a dead period of 3/4 of the frame time idling, doing nothing.

Its quite hard to really understand what you're trying to do without a bigger picture of the algorithm or the system it's implemented in. Whats the purpose of the output? it's clearly not going back out of an SDI feed (otherwise you'd get overflow or saturation of video). Or is this some kind of low pass filter (in which case you'd actually need a sliding window, not just 5 pixels and then the next 5).
 

I think you need to supply the testbench that you are using to test this entity/architecture. It's hard to determine what your input protocol would look like.

Like I said before your implementation choice uses a lot of adders, which would reduce to a single 5 pixel adder tree, if you performed the translation of 4 pixels to 5 pixels as I originally suggested.

It shouldn't be hard to translate what I posted to VHDL as there isn't anything complicated in the case statement. The case statement just translates the table above it.
The pixels are pipelined delayed so you have access to both the most recent pixel and the previous set of pixels. That gives you access to all 5 pixels you need for each bin.
 

Hi people!!!

Thanks for your answer. The algorithm works perfect. I test it with one test bench and it works ok. I also made some optimizations to reduce the number of multipliers and now it looks good. I still have the problem with a frequency limitation for frame rates over 700 Hz I am losing pixels or frames in the PC side and the training frame that I have programmed looks corrupted.

You are right, after the binning process I have a DDS algorithm, where the Birht frame is subtractecd from the Dark Frame (The sensor is ouputing frames of (1920x219pixels) in this way: Dark-Bright-Dark-Bright.....So I save a complete binned Dark Frame into a DDR2 and I make the DDS with the bright Frame on the fly. Finally I transferring the DDS frames via Ethernet protocol.

In this point I have 2 bandwith limitations:

1) DDR2 =>DDR2_MaxRate= 160Mhz x 16bits x 2 = 5.12 Gb/s
2) Ethernet =>ETH_MaxRate = 700 Mbs

After the binning the binned frame resulted is: 1920/5 x 219 = 384 pixels. And for example if I confugure a frame rate = 760 Hz I have corrupted images in the PC side. But with that frame rate I am under the two BW limits:

1) DDR2_Rate = FrameRate x 16bits x x 384 x219 = 1.022 Gbps < 5.12 Gbps => Ok
2) ETH_Rate = 700 Mbps / [(FrameRate/2) x 16 bits x 384x219] = 520Mbs < 700Mbps => OK

Everything looks ok respecting the two limitations however I still this frame rate limitation problem.

Anybody can tell me if I have to take into account anything else? Or where Can it be the bottle neck?.

Finally I need to do this operation in the FPGA:

DataOut=(DataIn*206)/256.

I need to do this equation in the minimum clocks cycles possible. Any idea?

Thanks anyway for everything.





I think you need to supply the testbench that you are using to test this entity/architecture. It's hard to determine what your input protocol would look like.

Like I said before your implementation choice uses a lot of adders, which would reduce to a single 5 pixel adder tree, if you performed the translation of 4 pixels to 5 pixels as I originally suggested.

It shouldn't be hard to translate what I posted to VHDL as there isn't anything complicated in the case statement. The case statement just translates the table above it.
The pixels are pipelined delayed so you have access to both the most recent pixel and the previous set of pixels. That gives you access to all 5 pixels you need for each bin.
 

Do your calculations include the Ethernet framing overhead? It doesn't look like it as you aren't adding amything else to the raw pixel data.
 

Yes In the maximum ETH frame rate (700 Mbps) is already included.

Anyway I need to do this operation inside of the FPGA:

Data_out = Data_in x (206/256). And I don´t have any idea about how to do it. But if this operation is not possible to do it, then it does not make to go on with the project...

Thanks anyway.

Do your calculations include the Ethernet framing overhead? It doesn't look like it as you aren't adding amything else to the raw pixel data.
 

data_in X 206/256 is just a multiply followed by a bit shift. in vhdl, a multiply is simply:

Code:
variable temp : unsigned(w+8 downto 0);

temp := d_in * 206;
d_out <= temp(w+8 downto 8);
 
Yes In the maximum ETH frame rate (700 Mbps) is already included.
Okay, I see the 700 Mbps is the maximum Ethernet payload rate, that wasn't clearly stated. 700Mbps is kind of low for Gigabit Ethernet, you must have rather short Ethernet packets.

Have you run simulations on the design and verified that the buffering is adequate for the throughput? I suggest checking the design for buffer over/under flows, which can show up when increasing the data throughput.
 

TrickyDicky thanks for your idea. I wil run some simulation and I will tell you if it is working....

On the other hand ads-ee, I already understand what is your qustion. The maximum packet size is 1.4Kb. I run some simulations and everyting is ok. But I already have located the problem. Something is wrong in the VHDL module which make the packets of bits to give them to the ETHERNET IP. I checked with signal tap that if I make bigger the FIFO memory where I am buffering the VIDEO DATA, the maximum frame rate that I can set up without losing frames in the PC side is increased from 350Hz to 370 Hz. The problem is that the maximum size of the FIFO is 2^11 words of 64 bits and I can¨t make it bigger cause I dont have more space in the FPGA.

Just only to clarify:
- With a FIFO = 2^10 x 64 bits => Frame Rate Max = 350 Hz
- With a FIFO = 2^11 x 64 bits => Frame Rate Max = 370 Hz

The code for making packages to send to the ETH IP core has been made by a third party and I don¨t have too much experience with ETH protocols. I will post here the VHDL code. Maybe someone is able to figure out what can be happen....

Code:
---------------------------
entity ETH_TX_CTRL is
---------------------------
  port (
	 FIFO_WR_FUL_ERR  : out std_logic;                            -- Fifo WR Full Error
	 FIFO_RD_FUL_ERR  : out std_logic;                            -- Fifo RD Full Error
    -- Clock and Reset
    CLK         : in  std_logic;                              -- Module Clock
    RST         : in  std_logic;                              -- Module Reset (Asynch Active high)
    RUN         : in  std_logic;                              -- Run
    -- Video Stream
    VIDEO_XSIZE : in  std_logic_vector(11 downto 0);          -- Video X Size
    VIDEO_YSIZE : in  std_logic_vector(11 downto 0);          -- Video Y Size
    VIDEO_SOI   : in  std_logic;                              -- Video Start of Image
    VIDEO_DAV   : in  std_logic;                              -- Video Data Valid
    VIDEO_DATA  : in  std_logic_vector(63 downto 0);          -- Video Data
    VIDEO_EOI   : in  std_logic;                              -- Video End of Image
    -- uC Data Interface, to insert inside TX Video Frames
    UC_TX_DAV   : in  std_logic;                              -- uC New Data to Send
    UC_TX_DATA  : in  std_logic_vector(23 downto 0);          -- uC New Data
    -- Encryption
    ENCRYPT_IN  : in  std_logic_vector(31 downto 0);          -- Encrypt Value to send in the Ethernet Frames
    -- GEDEK Interface
    ETH_TXREADY : in  std_logic;                              -- Ethernet Tx Ready ('0' = not ready !)
    ETH_TXSOP   : out std_logic;                              -- Ethernet Tx Start of Packet
    ETH_TXDAV   : out std_logic;                              -- Ethernet Tx Data Valid
    ETH_TXDATA  : out std_logic_vector(31 downto 0);          -- Ethernet Tx Data
    ETH_TXEOP   : out std_logic                               -- Ethernet Tx End of Packet
  );
---------------------------
end entity ETH_TX_CTRL;
---------------------------


------------------------------------
architecture RTL of ETH_TX_CTRL is
------------------------------------

  -- --------------------------
  --  Altera Single Clock FIFO
  -- --------------------------
  component SCFIFO is
    generic (
      ADD_RAM_OUTPUT_REGISTER : string;
      INTENDED_DEVICE_FAMILY  : string;
      LPM_NUMWORDS            : natural;
      LPM_SHOWAHEAD           : string;
      LPM_TYPE                : string;
      LPM_WIDTH               : natural;
      LPM_WIDTHU              : natural;
      OVERFLOW_CHECKING       : string;
      UNDERFLOW_CHECKING      : string;
      USE_EAB                 : string
    );
    port (
      ACLR  : in  std_logic ;
      CLOCK : in  std_logic ;
      SCLR  : in  std_logic ;
      WRREQ : in  std_logic ;
      DATA  : in  std_logic_vector(LPM_WIDTH -1 downto 0);
      FULL  : out std_logic ;
      USEDW : out std_logic_vector(LPM_WIDTHU-1 downto 0);
      EMPTY : out std_logic ;
      RDREQ : in  std_logic ;
      Q     : out std_logic_vector(LPM_WIDTH -1 downto 0)
    );
  end component SCFIFO;

  constant RESOLUTION_X : std_logic_vector(15 downto 0) := std_logic_vector(to_unsigned(48, 16));
  constant RESOLUTION_Y : std_logic_vector(15 downto 0) := std_logic_vector(to_unsigned(96, 16));

  -- FIFO to store Video Data
  constant FIFO_DEPTH : positive := 10;
  constant FIFO_WSIZE : positive := 64; 
  signal FIFO_CLR   : std_logic;
  signal FIFO_WR    : std_logic;
  signal FIFO_IN    : std_logic_vector(FIFO_WSIZE-1 downto 0);
  signal FIFO_FUL   : std_logic;
  signal FIFO_NB    : std_logic_vector(FIFO_DEPTH-1 downto 0);
  signal FIFO_EMP   : std_logic;
  signal FIFO_RD    : std_logic;
  signal FIFO_OUT   : std_logic_vector(FIFO_WSIZE-1 downto 0);
  signal FIFO_RDSEL : std_logic;

  -- FIFO to store uC Data
  constant UC_FIFO_DEPTH : positive :=  8;
  constant UC_FIFO_WSIZE : positive := 32; 
  signal UC_FIFO_CLR : std_logic;
  signal UC_FIFO_WR  : std_logic;
  signal UC_FIFO_IN  : std_logic_vector(UC_FIFO_WSIZE-1 downto 0);
  signal UC_FIFO_FUL : std_logic;
  signal UC_FIFO_NB  : std_logic_vector(UC_FIFO_DEPTH-1 downto 0);
  signal UC_FIFO_EMP : std_logic;
  signal UC_FIFO_RD  : std_logic;
  signal UC_FIFO_OUT : std_logic_vector(UC_FIFO_WSIZE-1 downto 0);
  
  signal UC_PACK_CNT : unsigned(15 downto 0);
  signal VIDEO_RUN   : std_logic;
  signal VIDEO_SOIm  : std_logic;
  signal VIDEO_EOIm  : std_logic;
  signal FRAME_CNT   : unsigned(31 downto 0);

  constant ETH_TXSIZE : positive := 361; -- 3 UDP Headers + 358 x 32bits words Data : max optimal value
  type ETH_TXFSM_t is ( s_IDLE, s_SEND_INFO_1, s_SEND_INFO_2, s_SEND_INFO_3, s_SEND_DATA, s_SEND_EOP );
  signal ETH_TXFSM : ETH_TXFSM_t;
  signal ETH_TXEND : std_logic;
  signal ETH_TXCNT : integer range 0 to ETH_TXSIZE;

--------
begin
--------

  -- -------------------------------
  --  VIDEO DATA FIFO
  -- -------------------------------
  FIFO_CLR <= VIDEO_SOI or not RUN;
  FIFO_WR  <= VIDEO_DAV and VIDEO_RUN;
  FIFO_IN  <= VIDEO_DATA;
    
  -- FIFO to store incoming data to send to GEDEK
  -- Perform 64bits to 32bits conversion !
  i_VIDEO_FIFO : SCFIFO
    generic map (
      ADD_RAM_OUTPUT_REGISTER => "ON",
      INTENDED_DEVICE_FAMILY  => "Cyclone IV",
      LPM_NUMWORDS            => 2**FIFO_DEPTH,
      LPM_SHOWAHEAD           => "ON",
      LPM_TYPE                => "SCFIFO",
      LPM_WIDTH               => FIFO_WSIZE,
      LPM_WIDTHU              => FIFO_DEPTH,
      OVERFLOW_CHECKING       => "ON",
      UNDERFLOW_CHECKING      => "ON",
      USE_EAB                 => "ON"
    )
    port map (
      ACLR  => RST     ,
      CLOCK => CLK     ,
      SCLR  => FIFO_CLR,
      WRREQ => FIFO_WR ,
      DATA  => FIFO_IN ,
      FULL  => FIFO_FUL,
      USEDW => FIFO_NB ,
      EMPTY => FIFO_EMP,
      RDREQ => FIFO_RD ,
      Q     => FIFO_OUT
    );
        
  FIFO_RD <= ETH_TXREADY and not FIFO_EMP and FIFO_RDSEL when ETH_TXFSM = s_SEND_DATA else '0';
  
  -- FIFO Full Error Detection      
  process(CLK, RST)
  begin
    if RST = '1' then
      FIFO_WR_FUL_ERR <= '0';
    elsif rising_edge(CLK) then
      if FIFO_FUL = '1' and FIFO_WR = '1' then
        FIFO_WR_FUL_ERR <= '1';
      end if;
    end if;
  end process;  
  
  process(CLK, RST)
  begin
    if RST = '1' then
      FIFO_RD_FUL_ERR <= '0';
    elsif rising_edge(CLK) then
      if FIFO_EMP = '1' and FIFO_RD = '1' then
        FIFO_RD_FUL_ERR <= '1';
      end if;
    end if;
  end process;  
  
  -- FIFO assertions
  process
  begin
    wait until rising_edge(CLK);
    assert not (FIFO_EMP = '1' and FIFO_RD = '1')
      report "[ETH_TX_CTRL] Try to read while FIFO is empty !" severity failure;
    assert not (FIFO_FUL  = '1' and FIFO_WR = '1')
      report "[ETH_TX_CTRL] Write in FIFO Full !" severity failure;
  end process;


  -- -------------------------------
  --  uC DATA FIFO
  -- -------------------------------
  UC_FIFO_CLR <= not RUN;
  UC_FIFO_WR  <= UC_TX_DAV and VIDEO_RUN;
  UC_FIFO_IN  <= x"00" & UC_TX_DATA;
  
  -- FIFO to store incoming data from GEDEK
  -- FIFO 1024x32bits, Show Ahead Mode, Output registered
  i_UC_FIFO : SCFIFO
    generic map (
      ADD_RAM_OUTPUT_REGISTER => "ON",
      INTENDED_DEVICE_FAMILY  => "Cyclone IV",
      LPM_NUMWORDS            => 2**UC_FIFO_DEPTH,
      LPM_SHOWAHEAD           => "ON",
      LPM_TYPE                => "SCFIFO",
      LPM_WIDTH               => UC_FIFO_WSIZE,
      LPM_WIDTHU              => UC_FIFO_DEPTH,
      OVERFLOW_CHECKING       => "ON",
      UNDERFLOW_CHECKING      => "ON",
      USE_EAB                 => "ON"
    )
    port map (
      ACLR  => RST        ,
      CLOCK => CLK        ,
      SCLR  => UC_FIFO_CLR,
      WRREQ => UC_FIFO_WR ,
      DATA  => UC_FIFO_IN ,
      FULL  => UC_FIFO_FUL,
      USEDW => UC_FIFO_NB ,
      EMPTY => UC_FIFO_EMP,
      RDREQ => UC_FIFO_RD ,
      Q     => UC_FIFO_OUT
    );
        
  -- uC FIFO assertions
  process
  begin
    wait until rising_edge(CLK);
    assert not (UC_FIFO_EMP = '1' and UC_FIFO_RD = '1')
      report "[ETH_TX_CTRL] Try to read while UC_FIFO is empty !" severity failure;
    assert not (UC_FIFO_FUL  = '1'and UC_FIFO_WR = '1')
      report "[ETH_TX_CTRL] Write in UC_FIFO Full !" severity failure;
  end process;


  -- -------------------
  -- TX FRAME MANAGEMENT
  -- -------------------
  TX_proc : process(RST,CLK)
  begin
    if RST='1' then
      ETH_TXEND   <= '0';
      ETH_TXSOP   <= '0';
      ETH_TXEOP   <= '0';
      ETH_TXDAV   <= '0';
      ETH_TXDATA  <= (others => '0');
      ETH_TXCNT   <=  0 ;
      ETH_TXFSM   <= s_IDLE;
      FRAME_CNT   <= (others => '0');
      VIDEO_RUN   <= '0';
      VIDEO_SOIm  <= '0';
      VIDEO_EOIm  <= '0';
      UC_PACK_CNT <= (others=>'0');
      UC_FIFO_RD  <= '0';
      FIFO_RDSEL  <= '0';
    elsif rising_edge(CLK) then

      UC_FIFO_RD  <= '0';
      
      -- Latch Run Signals at Start of Frame
      if RUN = '0' then
        VIDEO_RUN <= '0';
      elsif VIDEO_SOI = '1' then
        VIDEO_RUN <= RUN;
      end if;

      -- Memorize Start / End of Image Flags
      VIDEO_SOIm <= (VIDEO_SOIm or VIDEO_SOI) and RUN;
      VIDEO_EOIm <= (VIDEO_EOIm or VIDEO_EOI) and VIDEO_RUN;

      case ETH_TXFSM is

        -- Idle State : waiting for :
        -- * Enough data to send in the Fifo ?
        -- * End of Image flag from Video Interface
        when s_IDLE =>
          ETH_TXSOP <= '0';
          ETH_TXEOP <= '0';
          ETH_TXDAV <= '0';
          ETH_TXEND <= '0';
          -- Enough Words in FIFO
          if unsigned(FIFO_NB)*2 >= ETH_TXSIZE-3 then
            ETH_TXCNT <= ETH_TXSIZE-3;  -- Load Number of Data to send
            ETH_TXFSM <= s_SEND_INFO_1; -- Go Send UDP Header
          -- End of Image !
          elsif VIDEO_EOIm = '1' then
            ETH_TXEND <= '1';  -- will be used to clear the EOIm
            ETH_TXCNT <= to_integer(unsigned(FIFO_NB))*2; -- Load Number of Data to send
            ETH_TXFSM <= s_SEND_INFO_1;     -- Go Send UDP Header
          end if;

        -- UDP Header Word 1 :
        -- * if First Frame for the Image, send the Resolution
        -- * else send the Packets Counter on 16 MSBs
        when s_SEND_INFO_1 =>
          -- Send the SOP of this frame
          ETH_TXSOP <= '1';
          ETH_TXDAV <= '1';
          if VIDEO_SOIm = '1' then  -- New Image
            ETH_TXDATA  <= x"0" & VIDEO_YSIZE & x"0" & VIDEO_XSIZE;
            UC_PACK_CNT <= (others => '0');  -- Reset Packets counter
          else
            ETH_TXDATA  <= x"0000" & std_logic_vector(UC_PACK_CNT);
            UC_PACK_CNT <= UC_PACK_CNT + 1;  -- Increment for next time
          end if;
          ETH_TXFSM <= s_SEND_INFO_2;

        -- UDP Header Word 2 : Video Frame Counter
        when s_SEND_INFO_2 =>
          if ETH_TXREADY = '1' then  -- SOP accepted
            ETH_TXSOP  <= '0';  -- release the SOP
            ETH_TXDAV  <= '1';  -- send 2nd data
            ETH_TXDATA <= std_logic_vector(FRAME_CNT);
            ETH_TXFSM  <= s_SEND_INFO_3;
          end if;

        -- UDP Header Word 3 :
        -- * if First Frame for the Image, send the Encryption Word
        -- * else send the uC Word (if anything to send)
        -- * else 0's
        when s_SEND_INFO_3 =>
          if ETH_TXREADY = '1' then  -- 2nd data accepted
            ETH_TXDAV <= '1';
            if VIDEO_SOIm = '1' then
              ETH_TXDATA <= ENCRYPT_IN;  -- Encryption Word
            else
              -- Something to Send on the Dedicated uC Word in UDP Header ?
              if UC_FIFO_EMP = '0' then -- Yes (uC has sent something to the FPGA)
                UC_FIFO_RD <= '1';  -- Read the word now so that it will be available after
                ETH_TXDATA <= UC_FIFO_OUT;  -- uC Word
              else
                ETH_TXDATA <= (others => '0'); -- All Zeros
              end if;
            end if;
            VIDEO_SOIm <= '0';  -- Clear the Start of Image flag now
            FIFO_RDSEL <= '0'; 
            ETH_TXFSM  <= s_SEND_DATA;
          end if;

        -- Sending Data
        when s_SEND_DATA =>
          ETH_TXDAV  <= '1';
          if FIFO_RD = '1' then
            ETH_TXDATA <= FIFO_OUT(63 downto 32);
          elsif FIFO_EMP = '0' and FIFO_RDSEL = '0' then 
            ETH_TXDATA <= FIFO_OUT(31 downto 00);
          end if;
          if ETH_TXREADY = '1' then  -- Data Acknowledge by GEDEK
            FIFO_RDSEL <= not FIFO_RDSEL; 
            if ETH_TXCNT = 1 then
              ETH_TXEOP <= '1';
              ETH_TXFSM <= s_SEND_EOP;
              -- Clear the EOIm only if it was really the last frame to send             
              if VIDEO_EOIm = '1' and ETH_TXEND = '1' then  
                FRAME_CNT  <= FRAME_CNT + 1;
                VIDEO_EOIm <= '0';  -- Clear the End of Image flag now
              end if;
            end if;
            ETH_TXCNT  <= ETH_TXCNT - 1;
          end if;

        -- Sending EOP
        when s_SEND_EOP =>
          if ETH_TXREADY = '1' then
            ETH_TXDAV <= '0';  -- release the DAV
            ETH_TXEOP <= '0';  -- release the EOP
            ETH_TXFSM <= s_IDLE;
          end if;

      end case;

    end if;
  end process TX_proc;


-----------------------
end architecture RTL;
-----------------------
 
Last edited:

One thing you should check in a simulation or in the signaltap is monitor the UC_FIFO_NB used count and see how it behaves. I suspect you will discover the count fluctuates in such a way that the FIFO is going empty due to bursting when packets are sent. UDP has a pretty low overhead so you should easily be able to send 700Mbps.

I suggest running for many packets to see the overall trend of the FIFO level as it runs. You may not have run enough packets to see what is happening. Setting the UC_FIFO_NB count to an unsigned analog display in Modelsim, will give you a nice view of what is going on.
 

Hi,

I have already found problem!!! I was losing pixel and frames due to the PEAK the Bandwidth of the Ethernet side. This is the equation to handle this stuff:

For example, given a RealFrame = 1920 pixels x 219 lines. Aftter doing an Horizontal binning of 5 (Adding 5 pixels together) => BinnedFrame = 1920/5 x 219 = 384x219.
If I want to run the system at frame_rate = 500Hz => LineTime=219/500=9.13uS. But during a complete line I have to send 384x16bits=6144bits. So we have to send in average 6144bits in 9.13 uS => BW_ETH=672.94Mbps. Very close to the 700Mbps which is the limitation of my ETH interface....

This is a limitation of the system so I can't do any more here.

Thanks anyway for your interest and help. Thanks a lot dude!!!


One thing you should check in a simulation or in the signaltap is monitor the UC_FIFO_NB used count and see how it behaves. I suspect you will discover the count fluctuates in such a way that the FIFO is going empty due to bursting when packets are sent. UDP has a pretty low overhead so you should easily be able to send 700Mbps.

I suggest running for many packets to see the overall trend of the FIFO level as it runs. You may not have run enough packets to see what is happening. Setting the UC_FIFO_NB count to an unsigned analog display in Modelsim, will give you a nice view of what is going on.
 

Or maybe I can do some kind of cheating in the FPGA in order to reduce tha peak of the Bandwith in the ETH side like make the binning with '0' or something like this?

any idea aobut how can I solve this problem???

Hi,

I have already found problem!!! I was losing pixel and frames due to the PEAK the Bandwidth of the Ethernet side. This is the equation to handle this stuff:

For example, given a RealFrame = 1920 pixels x 219 lines. Aftter doing an Horizontal binning of 5 (Adding 5 pixels together) => BinnedFrame = 1920/5 x 219 = 384x219.
If I want to run the system at frame_rate = 500Hz => LineTime=219/500=9.13uS. But during a complete line I have to send 384x16bits=6144bits. So we have to send in average 6144bits in 9.13 uS => BW_ETH=672.94Mbps. Very close to the 700Mbps which is the limitation of my ETH interface....

This is a limitation of the system so I can't do any more here.

Thanks anyway for your interest and help. Thanks a lot dude!!!
 

What you need to do is work on your skills at analyzing where the bottle neck is located. It doesn't seem like you've got a handle on why the 673 Mbps is too much for a 700 Mbps channel.

First of all how is your UDP packets being constructed? Are you adding the headers after you have the entire packet stored? If so have you looked at how that affects the amount of dead time you are getting between adjacent packet transmissions?

I hope you are seeing where I'm going with this. I don't think you've analyzed the effects of encapsulation on the time it takes to send a packet after the preceding packet was sent. You've probably added enough dead time between packets where you've wasted 30% (=1-(350/500)) of the 700 Mbps you have.

I still think 700 Mbps is an extremely low value for payload on a Gigabit Ethernet connection using UDP packets that are quite large (1.4kB). Does this mean that most of the packets are very tiny like Ehternet_framing(42-octets)/IP(24-bytes)/UDP headers(8-bytes) (total:74-bytes) plus 106-bytes of payload? I can't envision any other way you're losing 30% of the bandwidth to overhead, unless you are sharing bandwidth with something else that is sending Ethernet frames?
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top