|
|
Digital techniques
have made rapid progress in audio and video. Digital Information is
more robust and error resilient. This means that generation losses
during recording and losses in transmission can be eliminated. The
compact disk (CD) was the first consumer product to demonstrate
this. Digital recording and transmission techniques allow content
manipulation that is not possible in analog. Once audio or video is
digitized, the contents are in the form of data. Such data can be
handled in the same way as any other kind of data. However,
production standard digital video generates over 200 megabits per
second of data, and this bit rate requires extensive capacity for
storage and wide bandwidth for transmission. This extensive storage
and bandwidth requirement can be reduced by compression. Compression
is a way of expressing digital audio and video by using less data.
MPEG is one of the most popular audio/video compression techniques
because it is not just a single standard. Instead, It is a range of
standards suitable for different applications but based on similar
principles. MPEG is an acronym for the Moving Picture Expert Group
established by ISO (International Standards Organization) and IEC
(International Electrotechnical Commission). A video is a sequence
of pictures and each picture is an array of pixels. This video data
is organized in a hierarchical fashion in an MPEG video stream. MPEG
video sequence consists of different layers, GOP, Pictures, Slices,
Macroblock, Block.
Video Sequence
Begins with a sequence header, includes one or more groups of
pictures, and ends with an end-of-sequence code.
Group of Pictures (GOP)
A header and a series of one or more pictures intended to allow
random access into the sequence.
Picture
This is primary coding unit of a video sequence. A picture consists
of three rectangular matrices representing luminance (Y) and two
chrominance (Cb and Cr) values. The Y matrix has an even number of
rows and columns. The Cb and Cr matrices are one half the size of
the Y matrix in horizontal and vertical directions.
Slice
It contains one or more contiguous macroblocks. The order of the
macroblocks within a slice is from left to right and top to bottom.
Slices are important in the handling of errors. If the bitstream
contains an error, the decoder can skip to start of the next slice.
Macroblock
This is basic coding unit in the MPEG algorithm. It is a 16x16 pixel
segment in a frame. If each chrominance component has one-half the
vertical and horizontal resolution of the luminance component, a
macroblock consists of four Y, one Cr, and one Cb block.
Block
This is smallest coding unit in the MPEG algorithm. It consists of
8x8 pixels and can be one of three types: luminance(Y), red
chrominance(Cr), or blue chrominance(Cb).
Picture Types
The MPEG standard specifically defines three types of pictures:
Intra Pictures (I-Pictures)
Predicted Pictures (P-Pictures)
Bidirectional Pictures (B-Pictures)
These three types of pictures are combined to form a group of
picture (GOP). Typical GOP structures are as follows:
IBBPBBPBBPBBPI……
IPPIPPIPPIPPIP……
IIIIIIIIIIIIII……
Intra Pictures
Intra pictures, or I-Pictures, are coded using only information
present in the picture itself, and provides potential random access
points into the compressed video data. It uses only transform coding
and provide moderate compression.
Predicted Pictures
Predicted pictures, or P-Pictures, are coded with respect to the
nearest previous I or P-Pictures. This technique is called forward
prediction. P-Pictures use motion compensation to provide more
compression than is possible with I-pictures.
Bidirectional Pictures
Bidirectional pictures, or B-pictures, are pictures that use both a
past and future picture as a reference. This technique is called
bidirectional prediction. B-pictures provide the most compression
since it uses the past and future picture as a reference, however
the computation time is largest.
Encoding Intra Picture
The MPEG transform coding algorithm for Intra picture includes the
following
steps:
• Discrete cosine transform (DCT)
• Quantization
• Run-length encoding
The 8x8 block in a picture generally contains high spatial
redundancy. To reduce this redundancy, the MPEG algorithm transforms
8x8 blocks of pixels from the spatial domain to the frequency domain
with the discrete cosine transform (DCT). The combination of DCT and
quantization results in many of the high frequency coefficients
being zero. To take maximum advantage of this, the coefficients are
organized in a zigzag order to produce long runs of zero. This
zigzag sequence is then coded with a variable length code (Huffman
Encoding), which uses shorter coded for commonly occurring pairs and
longer codes for less common pairs.
Encoding of Predicted Picture
A P-picture is coded with reference to a previous image (reference
image) that is an I or P pictures. Motion compensation based
prediction is used to exploit the temporal redundancy. Since the
frames are closely related, it is possible to accurately represent
or predict the data of one frame based on the data of a reference
image, provided the translation is estimated. This translation is
known as motion vector of macroblock. In P pictures, each 16x16
sized macroblock is predicted from a macroblock of a previously
encoded I picture. A search is conducted in the I frame to find the
macroblock which closely matches the macroblock under consideration
in the P frame. The difference between two macroblock is the
prediction error. This error can be coded in the DCT domain and
quantized. Finally it uses the run-length encoding and Huffman
encoding to encode the data.
Encoding of Bi-directional Pictures
A B picture is bidirectional predicted picture. Two frames are used
to predict the current B picture, the previous frame and the next
frame. Hence B pictures are coded like P pictures except the motion
vectors can reference either the previous reference picture, the
next picture, or both. Consider a B picture B. B will be predicted
from two reference frames R1 and R2. R1 is previous I/P picture and
R2 is next I/P picture. For each macroblock MB of B, find the
closest match MB1 in R1 and MB2 in R2. The predicted macroblock, PM
is calculated as given below.
PM = NINT (a1 MB1 + a2 MB2)
where,
NINT is nearest integer operator and a1 and a2 are described below.
a1 = 0.5 and a2 = 0.5 if both matches are satisfactory.
a1 = 1 and a2 = 0 if only first match is satisfactory.
a1 = 0 and a2 = 1 if only second match is satisfactory.
a1 = 0 and a2 = 0 if neither match is satisfactory.
Finally the error block E is computed by taking the difference of MB
and PM. This error block E is coded as per Intra coding standards. |
|