First, the problem can reduce to a sort. However the problem here is find_min. There is no reason to find the 2nd lowest, 3rd lowest, etc...
The entire circuit has the comparators and the selection. 125MHz is probably a bit high. There are a few options:
1.) the 7 compare + 7 2:1 muxes. tvalid can be used either as part of the 2:1 muxes or as a 65th bit.
2.) 7 compares and an 8:1 mux.
3.) 4 + 6 compares and 4 2:1 muxes and a 4:1 mux.
4.) a few others.
The long path in the design is that the results of a compare are used to select the arguments for another compare. However, 6 compares can be done on 4 inputs and the 6b result used to generate the select bits.
The compares for the first methods are:
(a,b), (c,d), (e,f), (g,h),
(min(a,b),min(c,d)), (min(e,f), min(g,h)),
(min(a,b,e,f), min(c,d,g,h)).
for the 6 compare method, they are (a,b), (a,c), (a,d), (b,c), (b,d), (c,d).
if A is lowest, then 000???
if B is lowest, then 1??00?
if C is lowest, then ?1?1?0
if D is lowest, then ??1?11
where 0 means the left side is lower and 1 means the left side is greater or equal. In anycase, it comes down to 2 LUT6's.