1024 is because it is a power of 2. As binary numbers double in size each time you add a bit, the progression is 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2014, 4096, 8192. That means you can select all the addresses up to 8192 using 12 bits, if you used decimal 1000 as size multiple, you would not be able to reach all the addresses without adding extra address lines which would be inefficient.
You mentioned using a "Dec 3/8" which I assume to be a 3 input to 8 output decoder (like a 74LS138). What you do is address all the 4 memories in parallel with A0-A12 but only select one of the 8K devices at a time. You select only one by using the decoder outputs, lets say Y0, Y1, Y2 and Y3 with those going to the CS (Chip Select) pin on each of the four 8K devices. That means that although your addresses go to all the memories at the same time, only the one activated with the 'Y' signal will respond. Now if you connect and additional TWO address lines to the input of the decoder, these effectively become A13 and A14. By using those two extra lines you select the first, second, third or fourth 8K block giving you 32K in total.
Note I mentioned TWO additional address line to the decoder, you must keep the third input (most significant bit) at logic low level. If you make it high, the decoder outputs Y4, Y5, Y6 and Y7 will be active but they don't connect to any memory CS pin so nothing wil be selected. However, you could use it to select another 4 8K memories to give 64K in total.
Note: when we refer to memories as being 8kX8 it means 8k (8192) addresses, each address holding 8 bits. You don't decode the 8 bits, they are the storage size of the address. Each address holds 00000000 to 11111111 as individual bits.
Brian.