You won't achieve a stable phase shift without a PLL. Logic gate delays can be used, but must be expected to vary with PVT (process, voltage, temperature) over a range of about 1:2. With respective timing constraints, a synthesis tool will try to achieve specified timing relations of clocks and signals by utilizing routing delays. Alternatively, you can insert logic cells manually and protect them against removal during design optimization.
I don't exactly understand what's the problem with process sensitivity lists. Clearly, a synchronous (edge sensitive) process can have only one clock. Utilizing the same clock (also delayed) as an additional asynchronous input signal won't make sense, normally. But anyway, you have to check, if the intended construct can be inplemented according to the hardware features of the logic family. If the answer is yes, there will be a way to implement it in HDL, possibly by instantiating low level primitives from a vendor library.
If the logic family don't allow the design construct physically, there's no HDL trick to overcome the restriction.