Different frame sizes accommodate different traffic needs. For instance, VoIP traffic is best served by many small datagrams, and a server backup is better served by fewer but larger datagrams.
If you had one fixed frame size, you could end up wasting a lot of network bandwidth by padding for VoIP with larger frame sizes, or you could end up wasting a lot of bandwidth with protocol overhead for server backups with smaller frame sizes.
One size does not fit all.
Edit to answer your changed question:
The sizes go back to the original ethernet medium. The minimum of 64 bytes was the minimum to make sure that a frame completely filled the medium from end-to-end so that the other hosts could detect that the medium is in use. The minimum is too small for some applications, so a maximum needed to be determined..
Your question of why the choice for the maximum size is actually answered in another question: How was the MTU size for ethernet frames calculated as 1500 bytes?