Reading EPROM BIOS - programmer format

Githin · Apr 15, 2012

What is the format of the data that I read from the EPROM using a programmer? How do I convert this to assembly language?

As far as I can see, the output from the programmer is Intel Hex format, and has all the fields indicated by the format.

But, I cannot find the same hex codes that I see in the output of the debug -d command of the same BIOS.

Debug -d command dumps the RAM to the screen, so, I can examine the EPROM that is loaded in to the RAM by viewing the hex code. For example, the ascii conversion of this data shows that the code is IBM compatible etc..

What I want to do is to find the assembly instructions by reading the EPROM with the programmer. And, the BIOS was on a Pentium Computer (80586).

srizbf · Apr 15, 2012

after you read the eprom contents , use a disassembler for pentium for getting the mnemonics.

Githin · Apr 15, 2012

srizbf said:
after you read the eprom contents , use a disassembler for pentium for getting the mnemonics.

I was wondering if there was some coversion to do before feeding the intel hex output from the programmer to the disasembler.

Can you recommend a good disassembler?

betwixt · Apr 15, 2012

Quite possibly there is. On modern systems the EPROM (or EEPROM) is copied to RAM at boot up. Running it in RAM is much faster. Because it is copied, the original may be compressed, the boot loader then uncompresses it in to the RAM image before jumping into it to complete the boot process. Normally LHA compression is used but there is no reason why other types (zip etc.) couldn't be used instead.

Brian.

Githin · Apr 15, 2012

betwixt said:
Quite possibly there is. On modern systems the EPROM (or EEPROM) is copied to RAM at boot up. Running it in RAM is much faster. Because it is copied, the original may be compressed, the boot loader then uncompresses it in to the RAM image before jumping into it to complete the boot process. Normally LHA compression is used but there is no reason why other types (zip etc.) couldn't be used instead.

Brian.

Is there a BIOS/EPROM programmer/software that can do the de-compression and give me the HEX files which I can disassemble?

How do I find out what format is being used to compress the code in my BIOS?

betwixt · Apr 15, 2012

Unfortunately, there isn't a standard way of compressing BIOS data because the decompression code always has to be in the same EPROM as well. The BIOS writer can use whatever they like but there may be clues to look out for:

1. The BIOS has to start using 'normal' uncompressed instructions. When the power is applied it will first run the POST tests, then run the decompressor to expand the rest of the EPROM into runnable code in RAM. Finally, it will jump to an address in RAM to continue the booting process. You can trace the instructions from the reset vector and find the start of the decompress routine.

2. If the BIOS is compressed, there is usually a tell-tale signature at the beginning of the block. Look at page boundaries for text containing 'lh', 'lha', 'lzw', 'rar' or 'zip' these are the most common compression types.

You also have the option to dump the uncompressed code to a file using the DEBUG program or something similar. You have to find the start address and end address then use the 'N' and 'W' commands to write the dump to a file. From there you can disassemble the instructions as normal. Basically, you let it do the decompressing for you then copy the result.

Brian.

Githin · Apr 16, 2012

betwixt said:
Unfortunately, there isn't a standard way of compressing BIOS data because the decompression code always has to be in the same EPROM as well. The BIOS writer can use whatever they like but there may be clues to look out for:

1. The BIOS has to start using 'normal' uncompressed instructions. When the power is applied it will first run the POST tests, then run the decompressor to expand the rest of the EPROM into runnable code in RAM. Finally, it will jump to an address in RAM to continue the booting process. You can trace the instructions from the reset vector and find the start of the decompress routine.

2. If the BIOS is compressed, there is usually a tell-tale signature at the beginning of the block. Look at page boundaries for text containing 'lh', 'lha', 'lzw', 'rar' or 'zip' these are the most common compression types.

You also have the option to dump the uncompressed code to a file using the DEBUG program or something similar. You have to find the start address and end address then use the 'N' and 'W' commands to write the dump to a file. From there you can disassemble the instructions as normal. Basically, you let it do the decompressing for you then copy the result.

Brian.

Thanks Brian. That makes perfect sense. I forgot to mention that I am looking at the VGA BIOS on a graphics card made in the 90s. I did a hex dump using the debug command from C0000 to C7FFF from the RAM (took several hours to complete). The file is 650 MB. I will try to analyze it. The VGA Bios code is copyrighted to Phoenix Technologies (the debug command tells me that).

Thanks once again.

betwixt · Apr 16, 2012

I was thinking of system BIOS rather than one on a VGA card. The situation may be a little different because cards of that era rarely if ever had their own processors. There was often an option to "shadow video memory" for better performance which did indeed copy the VGA BIOS into system RAM but as it was an option, the EPROM code would not have been compressed.

You probably did it the best way. At the start of the BIOS you should see the 'signature' which the main BIOS uses to initialize it. It's been a long time since I did it but if my memory serves me well, it is 0x55, 0xAA followed by the EPROM size then the service entry vectors. The idea is that when the main system BIOS is starting up, it looks at each likely address (every 8K boundary in the C000 range I think) for the signature so it knows an additional BIOS has been installed in the system. If it finds one, it checks the length looks reasonable then jumps to one of the service vectors. The service code writes new values into the system interrupt vectors so that video instructions are sent to the VGA card rather than somewhere else.

Many years ago there was a disassembler made by a company who I think were called "V Commmications" which did an excellent job of converting a hex dump back to readable code. It was a multi-pass disassembler that calculated the destination addresses of every jump/branch it found and gave them labels. On the next pass it changed the source instruction to reference the label rather than just the address which made it far easier to read. It even recognized and added the names of all the system calls automatically. I'm not sure if it is still around or an equivalent exists but it might be worth searching for.

Brian.

Githin · Apr 16, 2012

Thanks Brian.

I could see the 55 and AA in the code when I did debug -d C000:0000. However, they are not visible when I use an EPROM reader to read the data from the chip. The card I am talking about has its own video controller- its called ACUMOS AVGA1.

So, you think the VGA BIOS is not compressed? IF that is the case, how come I could not read info regarding the BIOS copyright and did not see the 55 AA in the Intel Hex format from the EPROM reader?

srizbf · Apr 16, 2012

the file size is given as 650MB. is it? 90's vga bios is in 32k rom space.
can use any 8086 disassembler to read the instructions.

Githin · Apr 16, 2012

srizbf said:
the file size is given as 650MB. is it? 90's vga bios is in 32k rom space.
can use any 8086 disassembler to read the instructions.

The ROM is 32 kilo bytes.

I have not been able to open the 650 MB file on the Windows system. Its got no network cards and I cannot transfer the file (I can fix that somehow).

I don't understand why its 650 mb. I am assuming that the text format takes up much more space than the simple HEX codes.

---------- Post added at 16:15 ---------- Previous post was at 16:10 ----------

Update: I opened the file. I read C000:0000 to C7FF:0000. There are many repetitions within this space.

So, I should read only between c000:0000 to c000:0FA0 to get the 32KB data. Everything else is possibly garbage.

betwixt · Apr 16, 2012

Intel HEX format is a way of representing binary numbers in a text format. It also adds extra address information to identify where the data belongs in memory and a checksum as well.
There are actually several Intel Hex formats but the one you probably have is Hex-8.

If you look at the hex file in a text editor you should see lots of lines, each starting with a colon ':', this is the start of record marker.
The next number is the line length, the number of data bytes in the record, often this will be '10' meaning there are 16 data bytes in that line
The next four numbers are the address of the data. Where it was read from and normally where it would be written back to.
A record number comes next, this can be several values to indicate which segment is in use (if applicable) and can also mark the final record in that block of memory.
The data itself is next.
Finally a checksum which is the 2's compliment of all the bytes in the line between the ':' and checksum itself.

I've colored the sections below to make them clearer:

:1000100003138C1E1128831203131A08C5004916F6

Note that all the hex data is represented as text, this is one of the reasons the file you have is much larger than the original binary. For example the single byte 0x55 would be saved as '55' in text, two characters of 0x35, the Ascii codes for '5'.

Brian.

Githin · Apr 16, 2012

betwixt said:
Intel HEX format is a way of representing binary numbers in a text format. It also adds extra address information to identify where the data belongs in memory and a checksum as well.
There are actually several Intel Hex formats but the one you probably have is Hex-8.

If you look at the hex file in a text editor you should see lots of lines, each starting with a colon ':', this is the start of record marker.
The next number is the line length, the number of data bytes in the record, often this will be '10' meaning there are 16 data bytes in that line
The next four numbers are the address of the data. Where it was read from and normally where it would be written back to.
A record number comes next, this can be several values to indicate which segment is in use (if applicable) and can also mark the final record in that block of memory.
The data itself is next.
Finally a checksum which is the 2's compliment of all the bytes in the line between the ':' and checksum itself.

I've colored the sections below to make them clearer:

:1000100003138C1E1128831203131A08C5004916F6

Note that all the hex data is represented as text, this is one of the reasons the file you have is much larger than the original binary. For example the single byte 0x55 would be saved as '55' in text, two characters of 0x35, the Ascii codes for '5'.

Brian.

Thanks.

This is the first 5 lines from the programmer -

:100000005530FA4FA8000000FE7100302F353949F5
:100010004D434D41494C0A6865695645286D204756
:100020002D6F70746265424F2065736F20200D4301
:1000300070726774282931382D3930506F6E7854BA
:10004000636E6C676520742E0A6C20696873527346

If I look at C000:0000 with the debug -d command, I get what is attached.

I don't understand how to convert between these two codes. This is the real meat of this problem. Compression may be the answer, but I am not sure if the VGA BIOS is being compressed or not.

betwixt · Apr 16, 2012

They are almost the same codes.
I'm guessing there are two EPROMs on the VGA card though. The dump from the programmer you show above only holds alternate bytes. Look closely at the first line:

55 AA 30 E9 FA 67 4F 6C A8 00 00 00 00 00 00

If I'm right, one EPROM holds the first byte and the other EPROM holds the second one and so on. The address lines would be wired so the chips are selected alternately and what you have is the data from just one of them. When the debug program is run, is reads the addresses sequentially while the EPROMs are on the board so you are unaware that first one, then the other is being read.

It isn't compressed, that's for certain but the disassembly would have to be done from the debug output rather than what the programmer read unless it has an option to merge the bytes from two read operations.

There is a funny story attached to the text in the debug dump: IBM wanted to ensure only their own VGA adapter were used so they check for the name IBM in the EPROM before initializing the card. The real IBM VGA card says something like "(c)IBM P/N 1504588". It didn't take long for other manufacturers to realize they could write any code they liked as long as the three letters IBM were there. So there are all sort of variations, including one I remember that said "Not (c)IBM" !!

Guess who had the job of training IBM engineers in the 80's.....

Brian.

Githin · Apr 17, 2012

betwixt said:
They are almost the same codes.
I'm guessing there are two EPROMs on the VGA card though. The dump from the programmer you show above only holds alternate bytes. Look closely at the first line:

55 AA 30 E9 FA 67 4F 6C A8 00 00 00 00 00 00

If I'm right, one EPROM holds the first byte and the other EPROM holds the second one and so on. The address lines would be wired so the chips are selected alternately and what you have is the data from just one of them. When the debug program is run, is reads the addresses sequentially while the EPROMs are on the board so you are unaware that first one, then the other is being read.

It isn't compressed, that's for certain but the disassembly would have to be done from the debug output rather than what the programmer read unless it has an option to merge the bytes from two read operations.

There is a funny story attached to the text in the debug dump: IBM wanted to ensure only their own VGA adapter were used so they check for the name IBM in the EPROM before initializing the card. The real IBM VGA card says something like "(c)IBM P/N 1504588". It didn't take long for other manufacturers to realize they could write any code they liked as long as the three letters IBM were there. So there are all sort of variations, including one I remember that said "Not (c)IBM" !!

Guess who had the job of training IBM engineers in the 80's.....

Brian.

That is an excellent catch. I was the 55 that was common. Did not catch the even and odd byte business. However, I am 100% sure that there is only 1 bios chip. The board as only 3 other chips - 2 RAMs and 1 VGA controller.

There is another BIOS on the motherboard, but, that should not interfere with the VGA bios's data.

I searched for the other half of the code and this is what I found
:10400000AAE9676C0000000084010737322F3142B3

Along with the first location, :100000005530FA4FA8000000FE7100302F353949F5,

we get all the data. The odd and even sets of bytes are in different locations as per the output from the programmer.

This is where all the even components of the data are located at.

Could this be an error with the reader I am using?

Another possibility is that the BIOS chip I am using is not marked with a number, I found the bios chip number based on the datasheet from the vga chip manufacturer and used that in the programmer. But, the board was made by someone else. They may have used a different chip than what the manufacturer suggested. And, maybe the address/data lines are arranged in a different fashion - that could screw up the data.

Another question: where do I point a disassembler to in the hex codes obtained by debug - d? What address should I start disassembling from? The initial few bytes are sort of signature like you pointed out earlier! Which software do you recommend to do the job?

Did you train the IBM engineers? You definitely know this thing very well! Thanks a lot Brian!

betwixt · Apr 17, 2012

With only one EPROM I would guess they re-used code intended for two ICs and 'stacked' one above the other. From there, all they have to do is connect the -CE pin to the top address line instead and it treats the new single code as independent top and bottom halves. The clue is in the address 4000 where you found the other bytes, it is 16K above the other ones so they probably combined two 16K devices into a single 32K one (27C256 ?). Around that era there was a rapid development of larger sized EPROMs and it was probably cheaper to use one IC that way than to rewrite the BIOS for a single device.

There is no point in disassembling the 55AA as those are just markers, the next byte (30) is the ROM image size so that can also be ignored. The three next bytes are E9 FA 67 which is an intra-segment jump to address 67FA where the 'real' code starts. So I would suggest disassembling from C000:0003 and you should see that jump as the first instruction.

Don't expect it all to be instructions, much of the EPROM will contain the values used to initialize the controller IC registers and palette and it also holds the font for each of the different video modes. I don't have my BIOS reference to hand but there is a system call which returns the start address of the font data. Google for it then write a simple program (use Debug) to make the call and the address is returned in one of the registers.

Search for "sourcer 8", it looks like the program I used to use although it was a much earlier version I had back then. If I'm right, it will also automatically comment the system calls for you and if it finds incomprehensible instructions (like the font table), it reports it as a possible data block.

Yep, I worked in the test department of an IBM sub-contractor back in the 80's before spending time with Motorola in Illinois.

Brian.

Githin · Apr 17, 2012

Thanks!

I am looking for values the VGA bios writes to the control register, 03C2. I will be dissembling the code today. I found only 4 locations in the complete ROM which refers to this address register and I believe its an OUT instruction."BA C2 03"

Thanks for all your help. Your comments have been very helpful.

betwixt · Apr 17, 2012

Don't forget the address stored in the DX register can be used to select an output port as well as the normal OUT instruction.

Brian.

Githin · Apr 18, 2012

I am guessing that the system BIOS also writes values in to the registers inside the VGA chip when the video mode is changed (Int 10h). Right? I am only trying to see what values are being written in to the 3C2 register by the BIOS (VGA or system).

Thanks.

betwixt · Apr 18, 2012

That is correct. During POST test, the system BIOS attempts to set up a display in order to be able to show error message ("Press F1 to resume" etc.). If it can't find one, it resorts to a pattern of beeps to alert you to a problem. Prior to scanning for EPROMs on plug-in cards, it tries first to initialize a monochrome display then a CGA display, both of these also have port 3C2 on them. Only when all the test are passed does it start looking for additional BIOS EPROMs as explained earlier. All video activity is routed through the system interrupt table at 0000:0000 through 0000:0400 and one of the first things the VGA BIOS does is hi-jack the video interrupt vectors so they point to code inside it's own EPROM. From then on, it has sole charge of handling the display.

Brian.

Welcome to EDAboard.com

Reading EPROM BIOS - programmer format

Junior Member level 2

Advanced Member level 5

Junior Member level 2

Super Moderator

Junior Member level 2

Super Moderator

Junior Member level 2

Super Moderator

Junior Member level 2

Advanced Member level 5

Junior Member level 2

Super Moderator

Junior Member level 2

Attachments

Super Moderator

Junior Member level 2

Super Moderator

Junior Member level 2

Super Moderator

Junior Member level 2

Super Moderator

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor