Input: A 16-bit data block m = (m0;m1; · · · ;m15) and a
64-bit subkey ki such that
subkey ki = K(i)1∥K(i)2∥K(i)3∥K(i)4
Output: A 16-bit date block m′ = (m′0;m′1; · · · ;m′15)
1: for j = 1 to 4 do
2: m ← m ⊕ K(i)
j [key mixing step]
3: A = m0∥m1∥m2∥m3; B = m4∥m5∥m6∥m7
C = m8∥m9∥m10∥m11;D = m12∥m13∥m14∥m15
4: m ← S1(A)∥S2(B)∥S3(C)∥S4(D)
[substitution layer]
5: m ← m ⊕ (m ≪ 6) ⊕ (m ≪ 10)
[permutation layer]
6: end for
7: m ← m ⊕ K(i)1 ⊕ K(i)3
8: A = m0∥m1∥m2∥m3; B = m4∥m5∥m6∥m7
C = m8∥m9∥m10∥m11;D = m12∥m13∥m14∥m15
9: m ← S1(A)∥S2(B)∥S3(C)∥S4(D)
10: m′ ← m ⊕ K(i)2⊕ K(i)4
11: return m′ = (m′0;m′1; · · · ;m′15)
I completed coding for remaining algorithms.. I'm unable to find a way to implement steps-3 & 4.
I thought that looked familiar. That exact same thing was asked (by you?) in another thread some time ago... Anyways, as stated in that other post ... that operation has an inverse. Should not be too difficult to figure out for the modern googling hardware designer on the go.