the quartus manual has specific coding guidelines for this. section 6-13
The tools really should be able to determine if something can fit in a few MLAB's, or if it needs to be fit in a dedicated RAM. You can also set synthesis attributes to specify a different implementation.
I think altera also has "megafunctions" for making large RAMs that use several dedicated block rams, but 4kB (?) doesn't source too large.
edit -- also, this is one reason why everyone should read the reccomended coding styles guide, as well as any coding style guidelines, from their FPGA manufacturer. The main goal of writing RTL is to get something that will map well to the actual device. The FPGA has some special, dedicated resources. If you read the coding guides, you'll infer more of these dedicated resources, and waste less of the generic fabric. Likewise, you might realize that some RTL optimization attempts are wastes of time, or counterproductive.