Serial Shift Register Network --> SPI Network and distributed RAM

mmprestine · Sep 22, 2017

Serial Shift Register Network --> SPI Network

I am researching methods for transferring data between two different networks. As a simple test I have a MCU configured in SPI DMA 16-bit mode to transmit/receive 64 16-bit words each respectively. The following VHLD listing proves the transfer works and data is good. Now the issue, when the array data is accessed from the other network I get un expected results on the SPI side.

What are some techniques used to a shared ram variable?

Do I need to use a dual port RAM for such access?

Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    use IEEE.STD_LOGIC_ARITH.ALL;
    use IEEE.STD_LOGIC_UNSIGNED.ALL;
    use IEEE.NUMERIC_STD.ALL;
 
entity clock is 
    port (
        SPI2_CLK : in STD_LOGIC;    
        SPI2_MISO : out STD_LOGIC;  
        SPI2_MOSI : in STD_LOGIC;   
        SPI2_CS : in STD_LOGIC;
    );      
end clock;
 
architecture arch of clock is
 
    type sio_size is array(integer range 0 to 63) of STD_LOGIC_VECTOR(15 downto 0);
    signal sio_in : sio_size; 
    signal sio_out : sio_size;
 
    signal count_spi2_bit : integer range 0 to 15 := 0;
    signal count_spi2_word : integer range 0 to 63 := 0;
    
begin
 
    spi2_data: process(SPI2_CLK,SPI2_CS)
    begin   
 
        if SPI2_CS = '1' then
        
            count_spi2_bit <= 0;
            count_spi2_word <= 0;
            
        elsif rising_edge(SPI2_CLK) then    
            
            sio_in(count_spi2_word) <= sio_in(count_spi2_word)(14 downto 0) & '0';
            
            sio_out(count_spi2_word) <= sio_out(count_spi2_word)(14 downto 0) & SPI2_MOSI;
            
            if count_spi2_bit = 15 then
                count_spi2_word <= count_spi2_word + 1; 
            end if; 
            
            count_spi2_bit <= count_spi2_bit + 1;   
            
        end if;
    end process;
    
    SPI2_MISO <= sio_in(count_spi2_word)(15);
 
end arch;

barry · Sep 23, 2017

How does that listing "prove the transfer works"? All it proves is that you have a listing. It looks like MISO depends on sio_in, but sio_in never gets assigned a value other than shifting in Zeroes. Either you didn't post all your code, or you've got some work to do.

FvM · Sep 23, 2017

Shifting the data directly inside the memory array looks like a bad idea. I don't know what "data is accessed from the other network" means as it's not provided by the shown code.

Shared variable is a VHDL construct which isn't necessarily implementable in actual hardware. I would prefer a dual port RAM if feasible according to the intended access method (can only access one word per clock cycle on each port).

vGoodtimes · Sep 23, 2017

@fvm, shared variables have exactly one use for inferring dual-clock true-dual-port RAMs.

That said, the code doesn't explain this second interface. The code appears to be some form of loopback -- the mcu appears to be a master device and can write to the fpga and read the written data back. There doesn't seem to be any way for the fpga to provide data to the spi interface, nor any way for the fpga to signal that data is ready. I'm guessing the shared ram is referring to that dual-clock true-dual-port ram. This can be inferred or instantiated and inferring the ram in vhdl means a shared variable and the code should match the code in the vivado/quartus synthesis guide.

from there, some mechanism to signal transmissions are completed/ready should be added for practical reasons.

--edit: The requirements for the dual-clock true-dual-port ram also mean the code snippet will not work as it does a read-modify-write on each port. The input data should go into an intermediate register and then written to the ram only when the entire 16b data has become valid. Either that or a true-dual-port BRAM should be instantiated and the spi_clk side interface should be a 1b interface and the fpga_clk side interface should be a 16b interface. Or the spi controller should be an oversampled version and all of the fpga_logic would run on the fpga_clk.

TrickyDicky · Sep 23, 2017

Shared variables are only used to infer dual port ram with write before read behaviour on address collisions. Otherwise a signal is fine.

vGoodtimes · Sep 24, 2017

Reading the vivado synthesis guide I almost get the impression that shared variables are preferred in all cases... Not entirely accurate, but they do have examples that use shared variables that don't seem to need to use them.

TrickyDicky · Sep 24, 2017

vGoodtimes said:
Reading the vivado synthesis guide I almost get the impression that shared variables are preferred in all cases... Not entirely accurate, but they do have examples that use shared variables that don't seem to need to use them.

It will work with either, and the only case it makes a difference is write-before-read behaviour. Many examples will be hanging around for years without edits (look at how many examples use non-standard VHDL libraries in Xilinx examples). Also, technically, using shared variables in the examples is against VHDL rules from 2002 onwards, as shared variables are supposed to be a protected type (non-synthesisable) but no tools (inc simulators) keep this rule as an error by default because it would break so much old code and is one of the few things in VHDL that is not backward compatible between versions (the only other thing I can think of is the file descriptors in '87 compared to '93)

Basically, stick with signals for consistency unless you absolutely have to. Personally, if I really wanted write-before-read behavior I would generate a core, as this behavior may not be so obvious to all VHDL users when they pick up your code. Using a core would be pretty explicit. Plus, it was the case a few years ago that Quartus would not change behaviour based on whether using signal or shared variable, but it would infer the same ram in either case. Do you really want to trust the tool for the more subtle inferred behaviours?

vGoodtimes · Sep 24, 2017

Interesting, by "will work" do you mean "will synthesize" or just simulate. Inferring RAMs has historically been quirky.

(look at how many examples use non-standard VHDL libraries in Xilinx examples)

Have you ever used Verilog? So much worse.

Also, inferring dual-clock _anything_ has issues. The only time I did this was because the existing tools were bad. There was no generic coregen. Coregen assumed that you only had fixed, non-generic designs and only wanted fixed, non-generic cores, and coregent was bad and xilinx should feel bad and was bad for this. Seriously, there were maybe 3 parameters and now I need to regenerate a dozen cores and can't use generics and whatever. I'm over it. The point is that you don't get as good sim coverage in terms of read/write conflicts that are undefined in implementation and reported in the primitives, but not reported in inferred.

TrickyDicky · Sep 24, 2017

vGoodtimes said:
Interesting, by "will work" do you mean "will synthesize" or just simulate. Inferring RAMs has historically been quirky.

Ive had no problems synthesising them. At previous workplace we had a library of infered rams that worked identically on both altera and xilinx. This made them generic and no need to use coregen/megawizard for either chip, and they were either single or dual clocked.
The inflexability of Xilinx coregen is a real pain. Altera have the excellently flexable altsyncram megafunction that can be used for everything from a single tiny distributed ram to a massive ram that uses many M20Ks, all from the same entity description/parameters. When I asked for a xilinx equivalent on the Xilinx forum I got told that coregen was the only method and why would I want to do anything else? (same for FIFOs to - Altera have a very flexable one, xilinx must use coregen, even in Vivado!)

mmprestine · Sep 25, 2017

Re: Serial Shift Register Network --&gt; SPI Network and distributed RAM

Thanks for all the dialog, out camping for the weekend so I am just getting back to the post now. It looks like I need to give a bit more detail on the system.

The MCU is a STM32 and when the SPI is configured for DMA mode it transmits/receives 16-bit data at the same time. In my example it is configured to transmit/receive 64 16-bit words. This is why the following VHDL processes both the MISO and MOSI on the same clock. The STM32 is producing data and expecting data on every clock transition.

Code VHDL - [expand]
1
2
3
4
5
elsif rising_edge(SPI2_CLK) then    
            
            sio_in(count_spi2_word) <= sio_in(count_spi2_word)(14 downto 0) & '0';
            
            sio_out(count_spi2_word) <= sio_out(count_spi2_word)(14 downto 0) & SPI2_MOSI;

I will post further information on the other network but for now there is a more simple approach to give similar results. I have modified the listing to add line 35 as an incremental counter to put some data into the array.

Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    use IEEE.STD_LOGIC_ARITH.ALL;
    use IEEE.STD_LOGIC_UNSIGNED.ALL;
    use IEEE.NUMERIC_STD.ALL;
 
entity clock is 
    port (
        SPI2_CLK : in STD_LOGIC;    
        SPI2_MISO : out STD_LOGIC;  
        SPI2_MOSI : in STD_LOGIC;   
        SPI2_CS : in STD_LOGIC;
    );      
end clock;
 
architecture arch of clock is
 
    type sio_size is array(integer range 0 to 63) of STD_LOGIC_VECTOR(15 downto 0);
    signal sio_in : sio_size; 
    signal sio_out : sio_size;
 
    signal count_spi2_bit : integer range 0 to 15 := 0;
    signal count_spi2_word : integer range 0 to 63 := 0;
    
begin
 
    spi2_data: process(SPI2_CLK,SPI2_CS)
    begin   
 
        if SPI2_CS = '1' then
        
            count_spi2_bit <= 0;
            count_spi2_word <= 0;
            
            sio_out(50) <= sio_out(50) + 1;
            
        elsif rising_edge(SPI2_CLK) then    
            
            sio_in(count_spi2_word) <= sio_in(count_spi2_word)(14 downto 0) & '0';
            
            sio_out(count_spi2_word) <= sio_out(count_spi2_word)(14 downto 0) & SPI2_MOSI;
            
            if count_spi2_bit = 15 then
                count_spi2_word <= count_spi2_word + 1; 
            end if; 
            
            count_spi2_bit <= count_spi2_bit + 1;   
            
        end if;
    end process;
    
    SPI2_MISO <= sio_in(count_spi2_word)(15);
 
end arch;

- - - Updated - - -

I dont see how to modify a post but the Line 35 in last post should have been sio_in(50) <= sio_in(50) + 1;

- - - Updated - - -

After reading the other posts again it makes me wonder if some of the results that I have seen are attributed to tool set configuration of block ram. I think that I need to review how the synthesis tool handles inferring block ram.

As a general thought my idea was the following.

configure an array of ram

transmit/receive data from network A and use the array of ram as the buffer

transmit/receive data from network B and use the array of ram as the buffer

Perhaps my thoughts of treating this array of ram as an inferred dual port buffer will not work. I will post a complete test and see what you all think.

ads-ee · Sep 25, 2017

Re: Serial Shift Register Network --> SPI Network and distributed RAM

You keep saying SPI but your protocol is not SPI. https://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus (pay specific attention to the Clock Polarity and Phase section)

SPI uses both edges of the clock, the leading edge captures data (depending on clock polarity the leading edge of the clock active edge is different) and the trailing edge to send data. You have everything running off a single clock on only 1 edge.

I think you've been looking at synchronous over sampled SPI designs and have confused that with designs using the SPI_CLK directly. Most everyone else on this thread seems to have overlooked this, as they are probably used to using SPI designs that are run synchronously off a faster clock than the SPI transfer, I'm probably one of the few here that has actually done SPI designs that use the SPI clock directly, which is useful if a micro needs to access a bunch of SPI peripherals including the system PLL (the one that drives all the clocks to the FPGA other than the SPI clock).

I'm pretty sure that using both of these together results in overloaded definitions and will generate warnings or errors.
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.NUMERIC_STD.ALL;
someone with more knowledge of VHDL can probably tell you the exact issues.

- - - Updated - - -

mmprestine said:
After reading the other posts again it makes me wonder if some of the results that I have seen are attributed to tool set configuration of block ram. I think that I need to review how the synthesis tool handles inferring block ram.

As a general thought my idea was the following.

configure an array of ram

transmit/receive data from network A and use the array of ram as the buffer

transmit/receive data from network B and use the array of ram as the buffer

Perhaps my thoughts of treating this array of ram as an inferred dual port buffer will not work. I will post a complete test and see what you all think.

I think your problems have more to do with trying to infer width converting s2p and p2s ram.

I don't think I've ever managed to synthesize an inferred block ram that has width conversion. I haven't tried recently, but all my inferred block rams use the same data width on both ports. Any serial data doesn't go directly to the ram but is instead shifted externally in FFs and is then written into the ram as parallel data.

TrickyDicky · Sep 25, 2017

Re: Serial Shift Register Network --> SPI Network and distributed RAM

ads-ee said:
I'm pretty sure that using both of these together results in overloaded definitions and will generate warnings or errors.
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.NUMERIC_STD.ALL;
someone with more knowledge of VHDL can probably tell you the exact issues.

As both define the type unsigned and signed, the types become hidden, so you have to access the types directly using the full path to them (otherwise you get an error saything things to this effect).
Delete std_logic_arith, and only use numeric_std.

I don't think I've ever managed to synthesize an inferred block ram that has width conversion. I haven't tried recently, but all my inferred block rams use the same data width on both ports. Any serial data doesn't go directly to the ram but is instead shifted externally in FFs and is then written into the ram as parallel data.

Altera coding guidelines do allow you to infer mixed width DP rams (and I have used it). I havent seen/tried anything in Xilinx.
But the code presented does not show anything that matches any ram templates. You have an asynchronous reset/set in the form of SPI2_CS that also increments a counter (asynchronous counters will never work) and this counter is one of the ram values. And because you havent included all signals from the clocked part in the async reset path, you're creating a synchronous enable out of SPI2_CS, as if it is high nothing can be clocked.

Please review ram templates and then retry.

mmprestine · Sep 25, 2017

Is this not a distributed RAM?

type sio_size is array(integer range 0 to 63) of STD_LOGIC_VECTOR(15 downto 0);
signal sio_in : sio_size;
signal sio_out : sio_size;

I will review the ram templates.

ads-ee · Sep 25, 2017

mmprestine said:
Is this not a distributed RAM?

type sio_size is array(integer range 0 to 63) of STD_LOGIC_VECTOR(15 downto 0);
signal sio_in : sio_size;
signal sio_out : sio_size;

I will review the ram templates.

A RAM template isn't just the type and signal definitions. It also involves how you behaviorally describe the RAM operation, which you did not exactly follow. Xilinx describes this in their synthesis document.

If you need to ensure it is always a distributed RAM then use the RAM_STYLE attribute.

vGoodtimes · Sep 26, 2017

Re: Serial Shift Register Network --> SPI Network and distributed RAM

ads-ee said:
I'm pretty sure that using both of these together results in overloaded definitions and will generate warnings or errors.
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.NUMERIC_STD.ALL;
someone with more knowledge of VHDL can probably tell you the exact issues..

There are a few main groups of users. The first is users that don't know the difference between the packages and use numeric_std because there is peer pressure to do so. The second group is users that include all of the math packages and then don't use them -- that is the case for this code where only std_logic_vector and integer are used. The next group is users who use std_logic_unsigned to get Verilog-like behavior. The last group is the rare case where someone knows the differences between the packages and can form an opinion based on technical merits vs politics. IMO, std_logic_unsigned (now numeric_std_unsigned) is a large reason why the old packages stayed relevant even to this day. That and so many file templates that included the math packages and didn't use them.

barry · Sep 26, 2017

A little off-topic, but we don't use numeric_std because of "peer pressure". We use it because it is a supported package, as opposed to std_logic_arith.

ads-ee · Sep 26, 2017

barry said:
We use it because it is a supported package, as opposed to std_logic_arith.

std_logic_arith was a de-facto standard, because Synopsys put their package in the IEEE library where it didn't belong (forgot where it's stated, but only ratified packages are allowed in the IEEE library!). In the 90s, Synopsys was a large and arrogant company, so they put their package where they pleased. Ergo, everyone believes it is a IEEE supported package, since it has been in there for nearly 30 years.

vGoodtimes · Sep 26, 2017

barry said:
A little off-topic, but we don't use numeric_std because of "peer pressure". We use it because it is a supported package, as opposed to std_logic_arith.

What are you using that doesn't support these packages? Can you find a tool that doesn't support them AND is natively x86?

std_logic_arith was a de-facto standard, because Synopsys put their package in the IEEE library where it didn't belong (forgot where it's stated, but only ratified packages are allowed in the IEEE library!).

Which isn't really a technical argument as much as a political one.

TrickyDicky · Sep 26, 2017

barry said:
A little off-topic, but we don't use numeric_std because of "peer pressure". We use it because it is a supported package, as opposed to std_logic_arith.

Std_logic_arith/unsigned are supported by all tools. The original argument for not using them was because after synopsys introduced the packages in their tools, other vendors wrote their own versions of the packages, which had different behaviours/functions, making them incompatable/non-portable. But that was a long time ago. Since I started in FPGAs 12 years ago (and probably before that) all vendors appear to now use the synopsys versions of the libraries.

What these packages dont get is any updates with new lanaguage revisions. numeric_std has had many updates in the 2008 revision, mostly to do with textio, that you dont get with the synopsys libraries.
While I originally swapped over due to politics, I now use the libraries because of practicality.

vGoodtimes · Sep 26, 2017

Agree. If you have a tool that supports vhdl-2008 -- something that is much more common almost a decade later -- you should use numeric_std as the core. numeric_std_unsigned might be fully or partially used based on preference. IIRC, the only thing from std_logic_arith that remains is the conversion from std_logic_vector to natural which was left out of std_logic_unsigned.

Interestingly though, all of the advice is still to include only numeric_std even though numeric_std_unsigned also exists and is now an official standard.

Welcome to EDAboard.com

Serial Shift Register Network --> SPI Network and distributed RAM

Newbie level 4

Advanced Member level 7

Super Moderator

Advanced Member level 4

Advanced Member level 7

Advanced Member level 4

Advanced Member level 7

Advanced Member level 4

Advanced Member level 7

Newbie level 4

Super Moderator

Advanced Member level 7

Newbie level 4

Super Moderator

Advanced Member level 4

Advanced Member level 7

Super Moderator

Advanced Member level 4

Advanced Member level 7

Advanced Member level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor