Small Super Computer

bkelly13 · Mar 1, 2021

The goal is to assembly a set of multi-core processors so as to create a parallel programming environment.

Someone started the company Adapteva, designed and built a credit card sized board called Parallella. On the board was a controlling CPU to manage the I/O, memory, and a multi core processor. It booted from a micro SD card. Not much else. Really just a hardware support package for a multi-core processor, about the size of a credit card. The entry level board was $99 with 16 cores. Later versions jumped up to 64 and 1024 cores. They called their processor Epiphany.

From what I read they did not garner sufficient software support and the project died. That said, I think the time is ripe for some parallel processing project.

How hard is it to begin with off the shelf processors and put one or more of them on a small board with minimum support, meaning maybe nothing but Ethernet?

Any thoughts on this concept?

dick_freebird · Mar 1, 2021

There's plenty of small multi-core cards
on the market (multi like in 3-4, not
2^10).

You could make yourself a computing
cluster (is Beowulf still a thing?) out of such
critters (and enough Ethernet routers).

But the programming environment, that's
a whole 'nother matter. If you want to make
your own maybe you don't care whether it's
a "dead" project, so long as you can harvest
the build docs and/or buy a few of remaining
assembled units?

KlausST · Mar 1, 2021

Hi,

A computer needs
* input
* output
* processing unit
* memory
* task

Let's start with the task. What is your target application? What is it supposed to do?

The task then determines
* what input an what output it needs, what interfacees, what speed...
* how the cores need to communicate with each other ... in terms of data flow (serial, parallel), in terms of software and in terms of controlling and timing (synchronizing)
* and whether and how they share memory

If I had to do such a design, I'd need a very detailed specification first. Otherwise there is a big risk that one does not get the expected performance.

Klaus

bkelly13 · Mar 1, 2021

Hello KlausST,

The major task is the ability to provide parallel processing. In the Parallella project the processing part was via an Epiphany chip. First version had 16 cores. Later versions went to 64 then 1028.

I/O: Ethernet connection: 1 Gbit. Maybe 10 Gbit in the future. The connections will be short. The Parallella project added a two core processor for the purpose is handling all the I/O, freeing the workers from I/O. That strikes me as an excellent idea. My presumption is that the workers do I/O via shared memory and nothing more. As long as the two core chip can supply that data they need, the workers go at max speed.

Memory: Each code must have its own working memory and some shared memory with the controller. How much? As much non-shared memory for each core as can be put on a multi core chip. This is a home project. There will be a trade-off between the number of cores and the amount of non-shared memory. However, I want to use existing chips.

Example: The Epiphany chip used by Adapteva.

Task: Any task that can be segmented into parallel processing.

How specific of an answer do you want?

Task examples

The Mandelbrot set: Every pixel is completely independent of its neighbor. The algorithm is quite simple. However, when zooming in deep, I ran out of floating point resolution. So I wrote some code, long ago, to do 128 bit fixed point arithmetic. Maybe go to 256 bit resolution. That starts taking some time, but is perfect for parallel processing.

Solar System: Maybe call it an orbiting body simulation: Create a star and planets and see how the orbits work out. Maybe multiple stars, maybe one or more black holes. To my knowledge, this can be done in two fundamental ways.

A) Each orbiting body assigned to a specific processor. Must query all the other processors for the locations of their body.

B) Each processor is assigned a volume of space. Bodies move through designated volumes of space. Each processor queries its neighbors to determine the gravity from everything in that direction and for any bodies close enough to need specific consideration. Does not need to query all the other processors. I suspect this is the better way, emphasis on suspect.

I have not gone far down this path, but any option will require significant sharing of data.

KlausST · Mar 2, 2021

Hi,

Mandelbrot and stars are completely different tasks.
Mandelbrot is completely independent if it's neighbours, does not need shared memory, does not need synchronisation.
Stars need a lot of shared memory, need synchronisation..

Another task could be FIR filtering. Means serial not parallel..

And here I think is the major problem. You want a flexible hardware for several different tasks, but those tasks have different requirements.
The effort in hardware will rise exponentially with the expected flexibility.
Or if you see it the other way round: even with huge effort, it still has a limited flexibility.

Having one 1GBit interface on just 100 cores will limit the data rate to 10Mbit/core in an unrealistic ideal case.
But a GBit interface for each core is huge effort... and maybe most applications don't need it.

I compare it with a vehicle to transport people. You want it to be most flexible: for single persons each having a unique start point and unique destination. But you also want to build it for extremely fast transport of people to/from a huge open air festival. For sure high speed, maybe low energy consumption, no need for expensive streets or railways..so flying vehicles. Avoiding accidents....

Mandelbrot is a kind of "game". Stars is a serious application. How many units do you want to sell? To whom?
To computer enthusiasts? To big companies?
Do you deliver just the hardware, or add libraries and APIs, or complete systems including software?
How much money can you spend? How much manpower do you have? For hardware, for software, for support...

A big task ... I'm not able to do. I even can't give advice where or how to focus.
It may become a big business, it may become a flop. I can't predict.

Klaus

wwfeldman · Mar 2, 2021

it isn't clear from your first post if you have googled Adapteva or Parallella

if not, try

**broken link removed**

http://www.adapteva.com/andreas-blog/adapteva-status/

and

http://www.digikey.com/en/products/...mcu-dsp/786?s=N4IgTCBcDaIAoEMBOCA2qCm6EgLoF8g

BradtheRad · Mar 2, 2021

Tasks suitable for parallel processing might include:

* chess player (for every possible move examine subsequent scenarios)

* traveling salesman problem (what is shortest path that visits every town)

bkelly13 · Mar 2, 2021

Wow, and Thank You for taking the time to read and reply.
Don't read too much into this. Before I retired, software engineer, real time code, but not into drivers and the like, I discovered the Parallella board and thought it really cool.

As I understand it: The company Adapteva was started to produce the board called Parallella. They designed and produced the multi core chip called Epiphany. RISC architecture. First version 16 core, second version 64, then higher counts, 1024 and maybe 4096. Within the chip there is, I suspect, fast communications. Between boards, use an Ethernet switch. With short cables and maybe CAT 6, up to 10 Gig.

I did not have time to investigate it then, but am now retired and interested. Looks like they are out of business because they could not get sufficient software support. Digikey appears to have 100+ boards in stock. It depends upon how you get to Digikey. Sometimes I see the boards available, sometimes not. Last time I saw them available the prices was $129 or something like that.

Buying maybe four boards and writing some example code would be a good exercise. Once the concept is set, it probably can be relatively easily changed to a different board.

But: To my way of thinking, even that board has too much stuff. Several versions have GPIO, HDMI, and other stuff. I would like a board with the absolute minimum of hardware. That said, the Parallella concept of having a processor to handle the I/O sounds really good. An industry standard processor that boots a standard version of Linux. Make that part real easy. It will handle all the I/O, communicate with the "worker" via shared memory, meaning that each worker core can work at its max speed. I don't want to optimize the overall board for any specific type of problem.

And that is why I am here. What kind of effort is required to produce a board as just described? It needs:
1. Industry standard 32 bit RISC processor to boot Linux and establish communications with the host computer.
2. Ethernet controller chip and connector
3. EEPROM to boot from and retain configuration changes. Epiphany can boot from here, hopefully. Careful address management required.
4. Shared memory to communicate with the Epiphany (or other relatively high core count processor)
5. Epiphany, or suitable replacement
6. Power management chip.

Might need a larger board with GIPO, address, and maybe other connections or LEDs for initial development of the controller software. That controller software then gives the working code and data set to the cores.

No great effort expended to make it really small. But small enough to put a nice count of them in enclosure on the order of a shoe box.

This seems like such a good idea, and the need for parallel processors is sufficiently high, I wonder why it does not exist already. And why Adapteva did not succeed. And if universities are working on this.

Thank you again for your time.

Welcome to EDAboard.com

Small Super Computer

bkelly13

Newbie

dick_freebird

Advanced Member level 7

KlausST

Advanced Member level 7

bkelly13

Newbie

KlausST

Advanced Member level 7

wwfeldman

Advanced Member level 4

BradtheRad

Super Moderator

bkelly13

Newbie

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics