The simplest thing You can do is to filter the image with high-pass filter (two dimmensional convolution). Then You manipulate the camera lens until You get maximal amplitude of signal (calculated for example as RMS value). This approach applies only in a situation when You are able to take serval shots of the same scene, but this is about setting the focus. If You want only to tell which of two images (assuming that the scene on both is the same, with same lighting conditions) had better focus, you can also use a high-pass filter and compare RMS values.
But... If You want a method which is independant on lighting conditions, colour, contrast, etc. You can use an algorithm, wich is usualy used in real-life devices. The approach is similiar, but now You need to use three filters, low-pass, band-pass, and high-pass, then get three separate RMS values. After that You need to calculate the proportions betwen those values. An image in which ratio of higher to lower frequencies is greater had a better focus.
I said that You need to pass the image through three filters, but to compare ratios of high to low frequencies You need only two filters (low-, and high-pass), the band-pass filter, as far as I remember is used to speed up the process of setting the focus. In this approach You can just use two filters, or maybe normalize low and high frequencies RMS value to midddle frequencies RMS value, try it, and see what gives the best results.
There is one more problem with determining the focus of an image, filtering whole image can result in an image with e.g. a sharp object in the background and blurry object in the foreground which is meant to be photographed, to have the focus set on the backround object. So another thing You need to do is to choose only a part of an image, preferably a part from the center. You can also take serval parts of an image, e.g. from center, up, left, right, and down, then calculate focus of each part separately and finaly calculate cumulative focus as weighted mean of all five parts, taking the central part with the highest weight. But unfortunately such an approach has some drawbacks. If You have fixed focus areas You can't always tell whether these areas are good for setting the focus, for example if one of them has a constant luminousity, like a sky, or a wall.
So then You can use another algorithm to find areas suitable for setting the focus. Such an area should have a sharp edges. To find them use any of edge detection algorithms, like Sobel (agains a two dimmensional convolution). Then select serval areas from edges image which have greatest value (RMS, mean, or anything else), so it contains many sharp pieces of an image. Then filter, and calculate RMS value for each area as above, and get the cumulative value, again using a weighted mean, but with weights proportional to the distance (squared eventually) from center of the image to center of each focus area.