No idea, you haven't given enough details that would let us decide.
The general idea is to look at you problem and find the parts that limit performance. Ideally, these could be moved to hardware, or to hardware accelerators. In some cases they can't. For example, if you are limited by external memory bandwidth, having faster sequential or parallel processing is of little use as you spend all your time waiting for data. Likewise, you might benefit from algorithmic improvements that you've not implemented in software.