Useful skew will be pretty much my last resort. Since you have mentioned the data path is 'almost reasonable' and you still have timing closure issues, this basically implies that your design is broken. I would go back to the logic designers to see if they can rip out some logic or add a new flop stage.
If you data path looks good and overall clock skew also looks good, there isn't really much we can do. I am assuming that you have already checked to make sure that the transitions are crisp and xtalk is minimum, ensured good placement, checked if high drive and low vt cells can be swapped in and so on...