I've got very limited background in this area. From what I've read, the maximum fidelity that can be passed by a digital encoding scheme varies according to the number of bits used to encode each second of signal. If you have the additional requirement that transmission be real time, that ends up translating into bandwidth. And the real time requirement is definitely there when it comes to telephone conversation.
You've got pieces of the whole puzzle in that statement.
Yes, the fidelity is partially governed by the number of bits that are used to encode the audio waveform. But mostly, fidelity is governed by the sampling rate as well as by the kind of data compression that takes place at later points in the processing. The sampling rate needs to be high enough to reproduce the data at the other end faithfully. And the compression may be responsible for significant degradation of the signal.
Nyquist criteria - the sampling rate has to be 2x the data content. A 5 KHZ voice waveform needs to be digitized with a 10 KHZ sample rate. (Actually it is called the
Nyquist-Shannon Sampling Theorem.)
Sound is digitized by passing audio waveform data (the stuff that feeds, say, a speaker) through an "A to D" or analog to digital device. The voltage level is transformed into a bit pattern that is generally a scaled equivalent of the sound. The data is sampled, literally, read off of the 16 or 32 bits of output of the A-to-D at regular intervals. On a CD player this is 22 KHZ, I think.
Then, the fact that the "data words" that represent the audio follow certain laws of physics is used to compress that data in a "lossy" way, using what is called a discrete cosine transformation (DCT.) I am clueless about the exact mechanism, but basically, compression in a lossy way is what results in MP3 files that are *much* smaller than WAV files, the latter which contains a literal series of uncompressed data words representing the audio waveform and is huge. What a MP3 file contains is grey goo that a decompression program can convert back into waveform data.
Finally, when uncompressed by undoing the DCT, the digital data is played back to a human by passing the data words into a Digital to Analog converter. A data word of 6F00 becomes, for instance, 1.75 volts.
Very high lossy compression is part of what results in loss of fidelity. Very high compression, of course, makes the data relatively small.
It's actually THAT "truncation" of the video signal that results in relatively poor satellite and cable video quality. MPEG is used for satellite and cable.
It's exactly like JPEG. Surely you have seen a blocky JPEG, compressed so it's small. Same idea.