In CMOS a N+/NWELL/N+ structure ist used. The layout is a N+ diffusion, the emitter, surounded by either a Poly-Gate or LOCOS-Isolation and then a N+-Ring, the collector. The base is the NWELL. The substrate is an additional terminal. So it is a 4-terminal structure. Most connect them only as diode but there are other bandgap circuits which use the lateral PNP more intelligent. Resulting in more circuit performance. One issue in the circuit above is the bandgap voltage spread because of the offset voltage of the NMOS. The other is the PSRR because of the NMOS again.