the cordic algorithm is not necessary now.
four years ago, i use rom compress algorithm to realize sine wave lookup table. now, for the FPGA's ram blocks being getting more and more bits, i only store 1/4 sine wave for a LUT.
for example, you can realize such a nco only cost one ram block by use spartan 3 series fpga:
1) sine and cosine two channel output
2) each of the output is 8 bit width
3) phase address width is 12bit, say, the LUT has 4096 units, each units is 8 bit width.