Here is some code for 4-bit operation. The initialization for 8 bits will be almost identical, except a few bytes.
Try this first. I know it works. Then, we can do it for 8 bits.
Note that there is a lot of stuff leftover from my project. I removed most of it, but there is still a lot. We'll worry about that later.