11.1. Making a high quality MPEG-4 ("DivX") rip of a DVD movie

One frequently asked question is "How do I make the highest quality rip for a given size?". Another question is "How do I make the highest quality DVD rip possible? I do not care about file size, I just want the best quality."

The latter question is perhaps at least somewhat wrongly posed. After all, if you do not care about file size, why not simply copy the entire MPEG-2 video stream from the the DVD? Sure, your AVI will end up being 5GB, give or take, but if you want the best quality and do not care about size, this is certainly your best option.

In fact, the reason you want to transcode a DVD into MPEG-4 is specifically because you do care about file size.

It is difficult to offer a cookbook recipe on how to create a very high quality DVD rip. There are several factors to consider, and you should understand these details or else you are likely to end up disappointed with your results. Below we will investigate some of these issues, and then have a look at an example. We assume you are using libavcodec to encode the video, although the theory applies to other codecs as well.

If this seems to be too much for you, you should probably use one of the many fine frontends that are listed in the MEncoder section of our related projects page. That way, you should be able to achieve high quality rips without too much thinking, because most of those tools are designed to take clever decisions for you.

11.1.1. Preparing to encode: Identifying source material and framerate

Before you even think about encoding a movie, you need to take several preliminary steps.

The first and most important step before you encode should be determining what type of content you are dealing with. If your source material comes from DVD or broadcast/cable/satellite TV, it will be stored in one of two formats: NTSC for North America and Japan, PAL for Europe, etc. It is important to realize, however, that this is just the formatting for presentation on a television, and often does not correspond to the original format of the movie. Experience shows that NTSC material is a lot more difficult to encode, because there more elements to identify in the source. In order to produce a suitable encode, you need to know the original format. Failure to take this into account will result in various flaws in your encode, including ugly combing (interlacing) artifacts and duplicated or even lost frames. Besides being ugly, the artifacts also harm coding efficiency: You will get worse quality per unit bitrate.

11.1.1.1. Identifying source framerate

Here is a list of common types of source material, where you are likely to find them, and their properties:

  • Standard Film: Produced for theatrical display at 24fps.

  • PAL video: Recorded with a PAL video camera at 50 fields per second. A field consists of just the odd- or even-numbered lines of a frame. Television was designed to refresh these in alternation as a cheap form of analog compression. The human eye supposedly compensates for this, but once you understand interlacing you will learn to see it on TV too and never enjoy TV again. Two fields do not make a complete frame, because they are captured 1/50 of a second apart in time, and thus they do not line up unless there is no motion.

  • NTSC Video: Recorded with an NTSC video camera at 60000/1001 fields per second, or 60 fields per second in the pre-color era. Otherwise similar to PAL.

  • Animation: Usually drawn at 24fps, but also comes in mixed-framerate varieties.

  • Computer Graphics (CG): Can be any framerate, but some are more common than others; 24 and 30 frames per second are typical for NTSC, and 25fps is typical for PAL.

  • Old Film: Various lower framerates.

11.1.1.2. Identifying source material

Movies consisting of frames are referred to as progressive, while those consisting of independent fields are called either interlaced or video - though this latter term is ambiguous.

To further complicate matters, some movies will be a mix of several of the above.

The most important distinction to make between all of these formats is that some are frame-based, while others are field-based. Whenever a movie is prepared for display on television (including DVD), it is converted to a field-based format. The various methods by which this can be done are collectively referred to as "telecine", of which the infamous NTSC "3:2 pulldown" is one variety. Unless the original material was also field-based (and the same fieldrate), you are getting the movie in a format other than the original.

There are several common types of pulldown:

  • PAL 2:2 pulldown: The nicest of them all. Each frame is shown for the duration of two fields, by extracting the even and odd lines and showing them in alternation. If the original material is 24fps, this process speeds up the movie by 4%.

  • PAL 2:2:2:2:2:2:2:2:2:2:2:3 pulldown: Every 12th frame is shown for the duration of three fields, instead of just two. This avoids the 4% speedup issue, but makes the process much more difficult to reverse. It is usually seen in musical productions where adjusting the speed by 4% would seriously damage the musical score.

  • NTSC 3:2 telecine: Frames are shown alternately for the duration of 3 fields or 2 fields. This gives a fieldrate 2.5 times the original framerate. The result is also slowed down very slightly from 60 fields per second to 60000/1001 fields per second to maintain NTSC fieldrate.

  • NTSC 2:2 pulldown: Used for showing 30fps material on NTSC. Nice, just like 2:2 PAL pulldown.

There are also methods for converting between NTSC and PAL video, but such topics are beyond the scope of this guide. If you encounter such a movie and want to encode it, your best bet is to find a copy in the original format. Conversion between these two formats is highly destructive and cannot be reversed cleanly, so your encode will greatly suffer if it is made from a converted source.

When video is stored on DVD, consecutive pairs of fields are grouped as a frame, even though they are not intended to be shown at the same moment in time. The MPEG-2 standard used on DVD and digital TV provides a way both to encode the original progressive frames and to store the number of fields for which a frame should be shown in the header of that frame. If this method has been used, the movie will often be described as "soft-telecined", since the process only directs the DVD player to apply pulldown to the movie rather than altering the movie itself. This case is highly preferable since it can easily be reversed (actually ignored) by the encoder, and since it preserves maximal quality. However, many DVD and broadcast production studios do not use proper encoding techniques but instead produce movies with "hard telecine", where fields are actually duplicated in the encoded MPEG-2.

The procedures for dealing with these cases will be covered later in this guide. For now, we leave you with some guides to identifying which type of material you are dealing with:

NTSC regions:

  • If MPlayer prints that the framerate has changed to 24000/1001 when watching your movie, and never changes back, it is almost certainly progressive content that has been "soft telecined".

  • If MPlayer shows the framerate switching back and forth between 24000/1001 and 30000/1001, and you see "combing" at times, then there are several possibilities. The 24000/1001 fps segments are almost certainly progressive content, "soft telecined", but the 30000/1001 fps parts could be either hard-telecined 24000/1001 fps content or 60000/1001 fields per second NTSC video. Use the same guidelines as the following two cases to determine which.

  • If MPlayer never shows the framerate changing, and every single frame with motion appears combed, your movie is NTSC video at 60000/1001 fields per second.

  • If MPlayer never shows the framerate changing, and two frames out of every five appear combed, your movie is "hard telecined" 24000/1001fps content.

PAL regions:

  • If you never see any combing, your movie is 2:2 pulldown.

  • If you see combing alternating in and out every half second, then your movie is 2:2:2:2:2:2:2:2:2:2:2:3 pulldown.

  • If you always see combing during motion, then your movie is PAL video at 50 fields per second.

Hint:

MPlayer can slow down movie playback with the -speed option or play it frame-by-frame. Try using -speed 0.2 to watch the movie very slowly or press the "." key repeatedly to play one frame at a time and identify the pattern, if you cannot see it at full speed.

11.1.2. Constant quantizer vs. multipass

It is possible to encode your movie at a wide range of qualities. With modern video encoders and a bit of pre-codec compression (downscaling and denoising), it is possible to achieve very good quality at 700 MB, for a 90-110 minute widescreen movie. Furthermore, all but the longest movies can be encoded with near-perfect quality at 1400 MB.

There are three approaches to encoding the video: constant bitrate (CBR), constant quantizer, and multipass (ABR, or average bitrate).

The complexity of the frames of a movie, and thus the number of bits required to compress them, can vary greatly from one scene to another. Modern video encoders can adjust to these needs as they go and vary the bitrate. In simple modes such as CBR, however, the encoders do not know the bitrate needs of future scenes and so cannot exceed the requested average bitrate for long stretches of time. More advanced modes, such as multipass encode, can take into account the statistics from previous passes; this fixes the problem mentioned above.

Note:

Most codecs which support ABR encode only support two pass encode while some others such as x264, Xvid and libavcodec support multipass, which slightly improves quality at each pass, yet this improvement is no longer measurable nor noticeable after the 4th or so pass. Therefore, in this section, two pass and multipass will be used interchangeably.

In each of these modes, the video codec (such as libavcodec) breaks the video frame into 16x16 pixel macroblocks and then applies a quantizer to each macroblock. The lower the quantizer, the better the quality and higher the bitrate. The method the movie encoder uses to determine which quantizer to use for a given macroblock varies and is highly tunable. (This is an extreme over-simplification of the actual process, but the basic concept is useful to understand.)

When you specify a constant bitrate, the video codec will encode the video, discarding detail as much as necessary and as little as possible in order to remain lower than the given bitrate. If you truly do not care about file size, you could as well use CBR and specify a bitrate of infinity. (In practice, this means a value high enough so that it poses no limit, like 10000Kbit.) With no real restriction on bitrate, the result is that the codec will use the lowest possible quantizer for each macroblock (as specified by vqmin for libavcodec, which is 2 by default). As soon as you specify a low enough bitrate that the codec is forced to use a higher quantizer, then you are almost certainly ruining the quality of your video. In order to avoid that, you should probably downscale your video, according to the method described later on in this guide. In general, you should avoid CBR altogether if you care about quality.

With constant quantizer, the codec uses the same quantizer, as specified by the vqscale option (for libavcodec), on every macroblock. If you want the highest quality rip possible, again ignoring bitrate, you can use vqscale=2. This will yield the same bitrate and PSNR (peak signal-to-noise ratio) as CBR with vbitrate=infinity and the default vqmin of 2.

The problem with constant quantizing is that it uses the given quantizer whether the macroblock needs it or not. That is, it might be possible to use a higher quantizer on a macroblock without sacrificing visual quality. Why waste the bits on an unnecessarily low quantizer? Your CPU has as many cycles as there is time, but there is only so many bits on your hard disk.

With a two pass encode, the first pass will rip the movie as though it were CBR, but it will keep a log of properties for each frame. This data is then used during the second pass in order to make intelligent decisions about which quantizers to use. During fast action or high detail scenes, higher quantizers will likely be used, and during slow moving or low detail scenes, lower quantizers will be used. Normally, the amount of motion is much more important than the amount of detail.

If you use vqscale=2, then you are wasting bits. If you use vqscale=3, then you are not getting the highest quality rip. Suppose you rip a DVD at vqscale=3, and the result is 1800Kbit. If you do a two pass encode with vbitrate=1800, the resulting video will have higher quality for the same bitrate.

Since you are now convinced that two pass is the way to go, the real question now is what bitrate to use? The answer is that there is no single answer. Ideally you want to choose a bitrate that yields the best balance between quality and file size. This is going to vary depending on the source video.

If size does not matter, a good starting point for a very high quality rip is about 2000Kbit plus or minus 200Kbit. For fast action or high detail source video, or if you just have a very critical eye, you might decide on 2400 or 2600. For some DVDs, you might not notice a difference at 1400Kbit. It is a good idea to experiment with scenes at different bitrates to get a feel.

If you aim at a certain size, you will have to somehow calculate the bitrate. But before that, you need to know how much space you should reserve for the audio track(s), so you should rip those first. You can compute the bitrate with the following equation: bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) * 1024 * 1024 / length_in_secs * 8 / 1000 For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB of audio track, the video bitrate will have to be: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps

11.1.3. Constraints for efficient encoding

Due to the nature of MPEG-type compression, there are various constraints you should follow for maximal quality. MPEG splits the video up into 16x16 squares called macroblocks, each composed of 4 8x8 blocks of luma (intensity) information and two half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and the other for the blue-yellow axis). Even if your movie width and height are not multip4rong> correspond to the original format of the movie. Experience shows that NTSC material is a lot more difficult to encode, because there more elements to identify in the source. In order to produce a suitable encode, you need to know the original format. Failure to take this into account will result in various flaws in your encode, including ugly combing (interlacing) artifacts and duplicated or even lost frames. Besides being ugly, the artifacts also harm coding efficiency: You will get worse quality per unit bitrate.

11.1.1.1. Identifying source framerate

Here is a list of common types of source material, where you are likely to find them, and their properties:

  • Standard Film: Produced for theatrical display at 24fps.

  • PAL video: Recorded with a PAL video camera at 50 fields per second. A field consists of just the odd- or even-numbered lines of a frame. Television was designed to refresh these in alternation as a cheap form of analog compression. The human eye supposedly compensates for this, but once you understand interlacing you will learn to see it on TV too and never enjoy TV again. Two fields do not make a complete frame, because they are captured 1/50 of a second apart in time, and thus they do not line up unless there is no motion.

  • NTSC Video: Recorded with an NTSC video camera at 60000/1001 fields per second, or 60 fields per second in the pre-color era. Otherwise similar to PAL.

  • Animation: Usually drawn at 24fps, but also comes in mixed-framerate varieties.

  • Computer Graphics (CG): Can be any framerate, but some are more common than others; 24 and 30 frames per second are typical for NTSC, and 25fps is typical for PAL.

  • Old Film: Various lower framerates.

11.1.1.2. Identifying source material

Movies consisting of frames are referred to as progressive, while those consisting of independent fields are called either interlaced or video - though this latter term is ambiguous.

To further complicate matters, some movies will be a mix of several of the above.

The most important distinction to make between all of these formats is that some are frame-based, while others are field-based. Whenever a movie is prepared for display on television (including DVD), it is converted to a field-based format. The various methods by which this can be done are collectively referred to as "telecine", of which the infamous NTSC "3:2 pulldown" is one variety. Unless the original material was also field-based (and the same fieldrate), you are getting the movie in a format other than the original.

There are several common types of pulldown:

  • PAL 2:2 pulldown: The nicest of them all. Each frame is shown for the duration of two fields, by extracting the even and odd lines and showing them in alternation. If the original material is 24fps, this process speeds up the movie by 4%.

  • PAL 2:2:2:2:2:2:2:2:2:2:2:3 pulldown: Every 12th frame is shown for the duration of three fields, instead of just two. This avoids the 4% speedup issue, but makes the process much more difficult to reverse. It is usually seen in musical productions where adjusting the speed by 4% would seriously damage the musical score.

  • NTSC 3:2 telecine: Frames are shown alternately for the duration of 3 fields or 2 fields. This gives a fieldrate 2.5 times the original framerate. The result is also slowed down very slightly from 60 fields per second to 60000/1001 fields per second to maintain NTSC fieldrate.

  • NTSC 2:2 pulldown: Used for showing 30fps material on NTSC. Nice, just like 2:2 PAL pulldown.

There are also methods for converting between NTSC and PAL video, but such topics are beyond the scope of this guide. If you encounter such a movie and want to encode it, your best bet is to find a copy in the original format. Conversion between these two formats is highly destructive and cannot be reversed cleanly, so your encode will greatly suffer if it is made from a converted source.

When video is stored on DVD, consecutive pairs of fields are grouped as a frame, even though they are not intended to be shown at the same moment in time. The MPEG-2 standard used on DVD and digital TV provides a way both to encode the original progressive frames and to store the number of fields for which a frame should be shown in the header of that frame. If this method has been used, the movie will often be described as "soft-telecined", since the process only directs the DVD player to apply pulldown to the movie rather than altering the movie itself. This case is highly preferable since it can easily be reversed (actually ignored) by the encoder, and since it preserves maximal quality. However, many DVD and broadcast production studios do not use proper encoding techniques but instead produce movies with "hard telecine", where fields are actually duplicated in the encoded MPEG-2.

The procedures for dealing with these cases will be covered later in this guide. For now, we leave you with some guides to identifying which type of material you are dealing with:

NTSC regions:

  • If MPlayer prints that the framerate has changed to 24000/1001 when watching your movie, and never changes back, it is almost certainly progressive content that has been "soft telecined".

  • If MPlayer shows the framerate switching back and forth between 24000/1001 and 30000/1001, and you see "combing" at times, then there are several possibilities. The 24000/1001 fps segments are almost certainly progressive content, "soft telecined", but the 30000/1001 fps parts could be either hard-telecined 24000/1001 fps content or 60000/1001 fields per second NTSC video. Use the same guidelines as the following two cases to determine which.

  • If MPlayer never shows the framerate changing, and every single frame with motion appears combed, your movie is NTSC video at 60000/1001 fields per second.

  • If MPlayer never shows the framerate changing, and two frames out of every five appear combed, your movie is "hard telecined" 24000/1001fps content.

PAL regions:

  • If you never see any combing, your movie is 2:2 pulldown.

  • If you see combing alternating in and out every half second, then your movie is 2:2:2:2:2:2:2:2:2:2:2:3 pulldown.

  • If you always see combing during motion, then your movie is PAL video at 50 fields per second.

Hint:

MPlayer can slow down movie playback with the -speed option or play it frame-by-frame. Try using -speed 0.2 to watch the movie very slowly or press the "." key repeatedly to play one frame at a time and identify the pattern, if you cannot see it at full speed.

11.1.2. Constant quantizer vs. multipass

It is possible to encode your movie at a wide range of qualities. With modern video encoders and a bit of pre-codec compression (downscaling and denoising), it is possible to achieve very good quality at 700 MB, for a 90-110 minute widescreen movie. Furthermore, all but the longest movies can be encoded with near-perfect quality at 1400 MB.

There are three approaches to encoding the video: constant bitrate (CBR), constant quantizer, and multipass (ABR, or average bitrate).

The complexity of the frames of a movie, and thus the number of bits required to compress them, can vary greatly from one scene to another. Modern video encoders can adjust to these needs as they go and vary the bitrate. In simple modes such as CBR, however, the encoders do not know the bitrate needs of future scenes and so cannot exceed the requested average bitrate for long stretches of time. More advanced modes, such as multipass encode, can take into account the statistics from previous passes; this fixes the problem mentioned above.

Note:

Most codecs which support ABR encode only support two pass encode while some others such as x264, Xvid and libavcodec support multipass, which slightly improves quality at each pass, yet this improvement is no longer measurable nor noticeable after the 4th or so pass. Therefore, in this section, two pass and multipass will be used interchangeably.

In each of these modes, the video codec (such as libavcodec) breaks the video frame into 16x16 pixel macroblocks and then applies a quantizer to each macroblock. The lower the quantizer, the better the quality and higher the bitrate. The method the movie encoder uses to determine which quantizer to use for a given macroblock varies and is highly tunable. (This is an extreme over-simplification of the actual process, but the basic concept is useful to understand.)

When you specify a constant bitrate, the video codec will encode the video, discarding detail as much as necessary and as little as possible in order to remain lower than the given bitrate. If you truly do not care about file size, you could as well use CBR and specify a bitrate of infinity. (In practice, this means a value high enough so that it poses no limit, like 10000Kbit.) With no real restriction on bitrate, the result is that the codec will use the lowest possible quantizer for each macroblock (as specified by vqmin for libavcodec, which is 2 by default). As soon as you specify a low enough bitrate that the codec is forced to use a higher quantizer, then you are almost certainly ruining the quality of your video. In order to avoid that, you should probably downscale your video, according to the method described later on in this guide. In general, you should avoid CBR altogether if you care about quality.

With constant quantizer, the codec uses the same quantizer, as specified by the vqscale option (for libavcodec), on every macroblock. If you want the highest quality rip possible, again ignoring bitrate, you can use vqscale=2. This will yield the same bitrate and PSNR (peak signal-to-noise ratio) as CBR with vbitrate=infinity and the default vqmin of 2.

The problem with constant quantizing is that it uses the given quantizer whether the macroblock needs it or not. That is, it might be possible to use a higher quantizer on a macroblock without sacrificing visual quality. Why waste the bits on an unnecessarily low quantizer? Your CPU has as many cycles as there is time, but there is only so many bits on your hard disk.

With a two pass encode, the first pass will rip the movie as though it were CBR, but it will keep a log of properties for each frame. This data is then used during the second pass in order to make intelligent decisions about which quantizers to use. During fast action or high detail scenes, higher quantizers will likely be used, and during slow moving or low detail scenes, lower quantizers will be used. Normally, the amount of motion is much more important than the amount of detail.

If you use vqscale=2, then you are wasting bits. If you use vqscale=3, then you are not getting the highest quality rip. Suppose you rip a DVD at vqscale=3, and the result is 1800Kbit. If you do a two pass encode with vbitrate=1800, the resulting video will have higher quality for the same bitrate.

Since you are now convinced that two pass is the way to go, the real question now is what bitrate to use? The answer is that there is no single answer. Ideally you want to choose a bitrate that yields the best balance between quality and file size. This is going to vary depending on the source video.

If size does not matter, a good starting point for a very high quality rip is about 2000Kbit plus or minus 200Kbit. For fast action or high detail source video, or if you just have a very critical eye, you might decide on 2400 or 2600. For some DVDs, you might not notice a difference at 1400Kbit. It is a good idea to experiment with scenes at different bitrates to get a feel.

If you aim at a certain size, you will have to somehow calculate the bitrate. But before that, you need to know how much space you should reserve for the audio track(s), so you should rip those first. You can compute the bitrate with the following equation: bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) * 1024 * 1024 / length_in_secs * 8 / 1000 For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB of audio track, the video bitrate will have to be: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps

11.1.3. Constraints for efficient encoding

Due to the nature of MPEG-type compression, there are various constraints you should follow for maximal quality. MPEG splits the video up into 16x16 squares called macroblocks, each composed of 4 8x8 blocks of luma (intensity) information and two half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and the other for the blue-yellow axis). Even if your movie width and height are not multip4rong> correspond to the original format of the movie. Experience shows that NTSC material is a lot more difficult to encode, because there more elements to identify in the source. In order to produce a suitable encode, you need to know the original format. Failure to take this into account will result in various flaws in your encode, including ugly combing (interlacing) artifacts and duplicated or even lost frames. Besides being ugly, the artifacts also harm coding efficiency: You will get worse quality per unit bitrate.

11.1.1.1. Identifying source framerate

Here is a list of common types of source material, where you are likely to find them, and their properties:

  • Standard Film: Produced for theatrical display at 24fps.

  • PAL video: Recorded with a PAL video camera at 50 fields per second. A field consists of just the odd- or even-numbered lines of a frame. Television was designed to refresh these in alternation as a cheap form of analog compression. The human eye supposedly compensates for this, but once you understand interlacing you will learn to see it on TV too and never enjoy TV again. Two fields do not make a complete frame, because they are captured 1/50 of a second apart in time, and thus they do not line up unless there is no motion.

  • NTSC Video: Recorded with an NTSC video camera at 60000/1001 fields per second, or 60 fields per second in the pre-color era. Otherwise similar to PAL.

  • Animation: Usually drawn at 24fps, but also comes in mixed-framerate varieties.

  • Computer Graphics (CG): Can be any framerate, but some are more common than others; 24 and 30 frames per second are typical for NTSC, and 25fps is typical for PAL.

  • Old Film: Various lower framerates.

11.1.1.2. Identifying source material

Movies consisting of frames are referred to as progressive, while those consisting of independent fields are called either interlaced or video - though this latter term is ambiguous.

To further complicate matters, some movies will be a mix of several of the above.

The most important distinction to make between all of these formats is that some are frame-based, while others are field-based. Whenever a movie is prepared for display on television (including DVD), it is converted to a field-based format. The various methods by which this can be done are collectively referred to as "telecine", of which the infamous NTSC "3:2 pulldown" is one variety. Unless the original material was also field-based (and the same fieldrate), you are getting the movie in a format other than the original.

There are several common types of pulldown:

  • PAL 2:2 pulldown: The nicest of them all. Each frame is shown for the duration of two fields, by extracting the even and odd lines and showing them in alternation. If the original material is 24fps, this process speeds up the movie by 4%.

  • PAL 2:2:2:2:2:2:2:2:2:2:2:3 pulldown: Every 12th frame is shown for the duration of three fields, instead of just two. This avoids the 4% speedup issue, but makes the process much more difficult to reverse. It is usually seen in musical productions where adjusting the speed by 4% would seriously damage the musical score.

  • NTSC 3:2 telecine: Frames are shown alternately for the duration of 3 fields or 2 fields. This gives a fieldrate 2.5 times the original framerate. The result is also slowed down very slightly from 60 fields per second to 60000/1001 fields per second to maintain NTSC fieldrate.

  • NTSC 2:2 pulldown: Used for showing 30fps material on NTSC. Nice, just like 2:2 PAL pulldown.

There are also methods for converting between NTSC and PAL video, but such topics are beyond the scope of this guide. If you encounter such a movie and want to encode it, your best bet is to find a copy in the original format. Conversion between these two formats is highly destructive and cannot be reversed cleanly, so your encode will greatly suffer if it is made from a converted source.

When video is stored on DVD, consecutive pairs of fields are grouped as a frame, even though they are not intended to be shown at the same moment in time. The MPEG-2 standard used on DVD and digital TV provides a way both to encode the original progressive frames and to store the number of fields for which a frame should be shown in the header of that frame. If this method has been used, the movie will often be described as "soft-telecined", since the process only directs the DVD player to apply pulldown to the movie rather than altering the movie itself. This case is highly preferable since it can easily be reversed (actually ignored) by the encoder, and since it preserves maximal quality. However, many DVD and broadcast production studios do not use proper encoding techniques but instead produce movies with "hard telecine", where fields are actually duplicated in the encoded MPEG-2.

The procedures for dealing with these cases will be covered later in this guide. For now, we leave you with some guides to identifying which type of material you are dealing with:

NTSC regions:

  • If MPlayer prints that the framerate has changed to 24000/1001 when watching your movie, and never changes back, it is almost certainly progressive content that has been "soft telecined".

  • If MPlayer shows the framerate switching back and forth between 24000/1001 and 30000/1001, and you see "combing" at times, then there are several possibilities. The 24000/1001 fps segments are almost certainly progressive content, "soft telecined", but the 30000/1001 fps parts could be either hard-telecined 24000/1001 fps content or 60000/1001 fields per second NTSC video. Use the same guidelines as the following two cases to determine which.

  • If MPlayer never shows the framerate changing, and every single frame with motion appears combed, your movie is NTSC video at 60000/1001 fields per second.

  • If MPlayer never shows the framerate changing, and two frames out of every five appear combed, your movie is "hard telecined" 24000/1001fps content.

PAL regions:

  • If you never see any combing, your movie is 2:2 pulldown.

  • If you see combing alternating in and out every half second, then your movie is 2:2:2:2:2:2:2:2:2:2:2:3 pulldown.

  • If you always see combing during motion, then your movie is PAL video at 50 fields per second.

Hint:

MPlayer can slow down movie playback with the -speed option or play it frame-by-frame. Try using -speed 0.2 to watch the movie very slowly or press the "." key repeatedly to play one frame at a time and identify the pattern, if you cannot see it at full speed.

11.1.2. Constant quantizer vs. multipass

It is possible to encode your movie at a wide range of qualities. With modern video encoders and a bit of pre-codec compression (downscaling and denoising), it is possible to achieve very good quality at 700 MB, for a 90-110 minute widescreen movie. Furthermore, all but the longest movies can be encoded with near-perfect quality at 1400 MB.

There are three approaches to encoding the video: constant bitrate (CBR), constant quantizer, and multipass (ABR, or average bitrate).

The complexity of the frames of a movie, and thus the number of bits required to compress them, can vary greatly from one scene to another. Modern video encoders can adjust to these needs as they go and vary the bitrate. In simple modes such as CBR, however, the encoders do not know the bitrate needs of future scenes and so cannot exceed the requested average bitrate for long stretches of time. More advanced modes, such as multipass encode, can take into account the statistics from previous passes; this fixes the problem mentioned above.

Note:

Most codecs which support ABR encode only support two pass encode while some others such as x264, Xvid and libavcodec support multipass, which slightly improves quality at each pass, yet this improvement is no longer measurable nor noticeable after the 4th or so pass. Therefore, in this section, two pass and multipass will be used interchangeably.

In each of these modes, the video codec (such as libavcodec) breaks the video frame into 16x16 pixel macroblocks and then applies a quantizer to each macroblock. The lower the quantizer, the better the quality and higher the bitrate. The method the movie encoder uses to determine which quantizer to use for a given macroblock varies and is highly tunable. (This is an extreme over-simplification of the actual process, but the basic concept is useful to understand.)

When you specify a constant bitrate, the video codec will encode the video, discarding detail as much as necessary and as little as possible in order to remain lower than the given bitrate. If you truly do not care about file size, you could as well use CBR and specify a bitrate of infinity. (In practice, this means a value high enough so that it poses no limit, like 10000Kbit.) With no real restriction on bitrate, the result is that the codec will use the lowest possible quantizer for each macroblock (as specified by vqmin for libavcodec, which is 2 by default). As soon as you specify a low enough bitrate that the codec is forced to use a higher quantizer, then you are almost certainly ruining the quality of your video. In order to avoid that, you should probably downscale your video, according to the method described later on in this guide. In general, you should avoid CBR altogether if you care about quality.

With constant quantizer, the codec uses the same quantizer, as specified by the vqscale option (for libavcodec), on every macroblock. If you want the highest quality rip possible, again ignoring bitrate, you can use vqscale=2. This will yield the same bitrate and PSNR (peak signal-to-noise ratio) as CBR with vbitrate=infinity and the default vqmin of 2.

The problem with constant quantizing is that it uses the given quantizer whether the macroblock needs it or not. That is, it might be possible to use a higher quantizer on a macroblock without sacrificing visual quality. Why waste the bits on an unnecessarily low quantizer? Your CPU has as many cycles as there is time, but there is only so many bits on your hard disk.

With a two pass encode, the first pass will rip the movie as though it were CBR, but it will keep a log of properties for each frame. This data is then used during the second pass in order to make intelligent decisions about which quantizers to use. During fast action or high detail scenes, higher quantizers will likely be used, and during slow moving or low detail scenes, lower quantizers will be used. Normally, the amount of motion is much more important than the amount of detail.

If you use vqscale=2, then you are wasting bits. If you use vqscale=3, then you are not getting the highest quality rip. Suppose you rip a DVD at vqscale=3, and the result is 1800Kbit. If you do a two pass encode with vbitrate=1800, the resulting video will have higher quality for the same bitrate.

Since you are now convinced that two pass is the way to go, the real question now is what bitrate to use? The answer is that there is no single answer. Ideally you want to choose a bitrate that yields the best balance between quality and file size. This is going to vary depending on the source video.

If size does not matter, a good starting point for a very high quality rip is about 2000Kbit plus or minus 200Kbit. For fast action or high detail source video, or if you just have a very critical eye, you might decide on 2400 or 2600. For some DVDs, you might not notice a difference at 1400Kbit. It is a good idea to experiment with scenes at different bitrates to get a feel.

If you aim at a certain size, you will have to somehow calculate the bitrate. But before that, you need to know how much space you should reserve for the audio track(s), so you should rip those first. You can compute the bitrate with the following equation: bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) * 1024 * 1024 / length_in_secs * 8 / 1000 For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB of audio track, the video bitrate will have to be: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps

11.1.3. Constraints for efficient encoding

Due to the nature of MPEG-type compression, there are various constraints you should follow for maximal quality. MPEG splits the video up into 16x16 squares called macroblocks, each composed of 4 8x8 blocks of luma (intensity) information and two half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and the other for the blue-yellow axis). Even if your movie width and height are not multip4rong> correspond to the original format of the movie. Experience shows that NTSC material is a lot more difficult to encode, because there more elements to identify in the source. In order to produce a suitable encode, you need to know the original format. Failure to take this into account will result in various flaws in your encode, including ugly combing (interlacing) artifacts and duplicated or even lost frames. Besides being ugly, the artifacts also harm coding efficiency: You will get worse quality per unit bitrate.

11.1.1.1. Identifying source framerate

Here is a list of common types of source material, where you are likely to find them, and their properties:

  • Standard Film: Produced for theatrical display at 24fps.

  • PAL video: Recorded with a PAL video camera at 50 fields per second. A field consists of just the odd- or even-numbered lines of a frame. Television was designed to refresh these in alternation as a cheap form of analog compression. The human eye supposedly compensates for this, but once you understand interlacing you will learn to see it on TV too and never enjoy TV again. Two fields do not make a complete frame, because they are captured 1/50 of a second apart in time, and thus they do not line up unless there is no motion.

  • NTSC Video: Recorded with an NTSC video camera at 60000/1001 fields per second, or 60 fields per second in the pre-color era. Otherwise similar to PAL.

  • Animation: Usually drawn at 24fps, but also comes in mixed-framerate varieties.

  • Computer Graphics (CG): Can be any framerate, but some are more common than others; 24 and 30 frames per second are typical for NTSC, and 25fps is typical for PAL.

  • Old Film: Various lower framerates.

11.1.1.2. Identifying source material

Movies consisting of frames are referred to as progressive, while those consisting of independent fields are called either interlaced or video - though this latter term is ambiguous.

To further complicate matters, some movies will be a mix of several of the above.

The most important distinction to make between all of these formats is that some are frame-based, while others are field-based. Whenever a movie is prepared for display on television (including DVD), it is converted to a field-based format. The various methods by which this can be done are collectively referred to as "telecine", of which the infamous NTSC "3:2 pulldown" is one variety. Unless the original material was also field-based (and the same fieldrate), you are getting the movie in a format other than the original.

There are several common types of pulldown:

  • PAL 2:2 pulldown: The nicest of them all. Each frame is shown for the duration of two fields, by extracting the even and odd lines and showing them in alternation. If the original material is 24fps, this process speeds up the movie by 4%.

  • PAL 2:2:2:2:2:2:2:2:2:2:2:3 pulldown: Every 12th frame is shown for the duration of three fields, instead of just two. This avoids the 4% speedup issue, but makes the process much more difficult to reverse. It is usually seen in musical productions where adjusting the speed by 4% would seriously damage the musical score.

  • NTSC 3:2 telecine: Frames are shown alternately for the duration of 3 fields or 2 fields. This gives a fieldrate 2.5 times the original framerate. The result is also slowed down very slightly from 60 fields per second to 60000/1001 fields per second to maintain NTSC fieldrate.

  • NTSC 2:2 pulldown: Used for showing 30fps material on NTSC. Nice, just like 2:2 PAL pulldown.

There are also methods for converting between NTSC and PAL video, but such topics are beyond the scope of this guide. If you encounter such a movie and want to encode it, your best bet is to find a copy in the original format. Conversion between these two formats is highly destructive and cannot be reversed cleanly, so your encode will greatly suffer if it is made from a converted source.

When video is stored on DVD, consecutive pairs of fields are grouped as a frame, even though they are not intended to be shown at the same moment in time. The MPEG-2 standard used on DVD and digital TV provides a way both to encode the original progressive frames and to store the number of fields for which a frame should be shown in the header of that frame. If this method has been used, the movie will often be described as "soft-telecined", since the process only directs the DVD player to apply pulldown to the movie rather than altering the movie itself. This case is highly preferable since it can easily be reversed (actually ignored) by the encoder, and since it preserves maximal quality. However, many DVD and broadcast production studios do not use proper encoding techniques but instead produce movies with "hard telecine", where fields are actually duplicated in the encoded MPEG-2.

The procedures for dealing with these cases will be covered later in this guide. For now, we leave you with some guides to identifying which type of material you are dealing with:

NTSC regions:

  • If MPlayer prints that the framerate has changed to 24000/1001 when watching your movie, and never changes back, it is almost certainly progressive content that has been "soft telecined".

  • If MPlayer shows the framerate switching back and forth between 24000/1001 and 30000/1001, and you see "combing" at times, then there are several possibilities. The 24000/1001 fps segments are almost certainly progressive content, "soft telecined", but the 30000/1001 fps parts could be either hard-telecined 24000/1001 fps content or 60000/1001 fields per second NTSC video. Use the same guidelines as the following two cases to determine which.

  • If MPlayer never shows the framerate changing, and every single frame with motion appears combed, your movie is NTSC video at 60000/1001 fields per second.

  • If MPlayer never shows the framerate changing, and two frames out of every five appear combed, your movie is "hard telecined" 24000/1001fps content.

PAL regions:

  • If you never see any combing, your movie is 2:2 pulldown.

  • If you see combing alternating in and out every half second, then your movie is 2:2:2:2:2:2:2:2:2:2:2:3 pulldown.

  • If you always see combing during motion, then your movie is PAL video at 50 fields per second.

Hint:

MPlayer can slow down movie playback with the -speed option or play it frame-by-frame. Try using -speed 0.2 to watch the movie very slowly or press the "." key repeatedly to play one frame at a time and identify the pattern, if you cannot see it at full speed.

11.1.2. Constant quantizer vs. multipass

It is possible to encode your movie at a wide range of qualities. With modern video encoders and a bit of pre-codec compression (downscaling and denoising), it is possible to achieve very good quality at 700 MB, for a 90-110 minute widescreen movie. Furthermore, all but the longest movies can be encoded with near-perfect quality at 1400 MB.

There are three approaches to encoding the video: constant bitrate (CBR), constant quantizer, and multipass (ABR, or average bitrate).

The complexity of the frames of a movie, and thus the number of bits required to compress them, can vary greatly from one scene to another. Modern video encoders can adjust to these needs as they go and vary the bitrate. In simple modes such as CBR, however, the encoders do not know the bitrate needs of future scenes and so cannot exceed the requested average bitrate for long stretches of time. More advanced modes, such as multipass encode, can take into account the statistics from previous passes; this fixes the problem mentioned above.

Note:

Most codecs which support ABR encode only support two pass encode while some others such as x264, Xvid and libavcodec support multipass, which slightly improves quality at each pass, yet this improvement is no longer measurable nor noticeable after the 4th or so pass. Therefore, in this section, two pass and multipass will be used interchangeably.

In each of these modes, the video codec (such as libavcodec) breaks the video frame into 16x16 pixel macroblocks and then applies a quantizer to each macroblock. The lower the quantizer, the better the quality and higher the bitrate. The method the movie encoder uses to determine which quantizer to use for a given macroblock varies and is highly tunable. (This is an extreme over-simplification of the actual process, but the basic concept is useful to understand.)

When you specify a constant bitrate, the video codec will encode the video, discarding detail as much as necessary and as little as possible in order to remain lower than the given bitrate. If you truly do not care about file size, you could as well use CBR and specify a bitrate of infinity. (In practice, this means a value high enough so that it poses no limit, like 10000Kbit.) With no real restriction on bitrate, the result is that the codec will use the lowest possible quantizer for each macroblock (as specified by vqmin for libavcodec, which is 2 by default). As soon as you specify a low enough bitrate that the codec is forced to use a higher quantizer, then you are almost certainly ruining the quality of your video. In order to avoid that, you should probably downscale your video, according to the method described later on in this guide. In general, you should avoid CBR altogether if you care about quality.

With constant quantizer, the codec uses the same quantizer, as specified by the vqscale option (for libavcodec), on every macroblock. If you want the highest quality rip possible, again ignoring bitrate, you can use vqscale=2. This will yield the same bitrate and PSNR (peak signal-to-noise ratio) as CBR with vbitrate=infinity and the default vqmin of 2.

The problem with constant quantizing is that it uses the given quantizer whether the macroblock needs it or not. That is, it might be possible to use a higher quantizer on a macroblock without sacrificing visual quality. Why waste the bits on an unnecessarily low quantizer? Your CPU has as many cycles as there is time, but there is only so many bits on your hard disk.

With a two pass encode, the first pass will rip the movie as though it were CBR, but it will keep a log of properties for each frame. This data is then used during the second pass in order to make intelligent decisions about which quantizers to use. During fast action or high detail scenes, higher quantizers will likely be used, and during slow moving or low detail scenes, lower quantizers will be used. Normally, the amount of motion is much more important than the amount of detail.

If you use vqscale=2, then you are wasting bits. If you use vqscale=3, then you are not getting the highest quality rip. Suppose you rip a DVD at vqscale=3, and the result is 1800Kbit. If you do a two pass encode with vbitrate=1800, the resulting video will have higher quality for the same bitrate.

Since you are now convinced that two pass is the way to go, the real question now is what bitrate to use? The answer is that there is no single answer. Ideally you want to choose a bitrate that yields the best balance between quality and file size. This is going to vary depending on the source video.

If size does not matter, a good starting point for a very high quality rip is about 2000Kbit plus or minus 200Kbit. For fast action or high detail source video, or if you just have a very critical eye, you might decide on 2400 or 2600. For some DVDs, you might not notice a difference at 1400Kbit. It is a good idea to experiment with scenes at different bitrates to get a feel.

If you aim at a certain size, you will have to somehow calculate the bitrate. But before that, you need to know how much space you should reserve for the audio track(s), so you should rip those first. You can compute the bitrate with the following equation: bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) * 1024 * 1024 / length_in_secs * 8 / 1000 For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB of audio track, the video bitrate will have to be: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps

11.1.3. Constraints for efficient encoding

Due to the nature of MPEG-type compression, there are various constraints you should follow for maximal quality. MPEG splits the video up into 16x16 squares called macroblocks, each composed of 4 8x8 blocks of luma (intensity) information and two half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and the other for the blue-yellow axis). Even if your movie width and height are not multip4rong> correspond to the original format of the movie. Experience shows that NTSC material is a lot more difficult to encode, because there more elements to identify in the source. In order to produce a suitable encode, you need to know the original format. Failure to take this into account will result in various flaws in your encode, including ugly combing (interlacing) artifacts and duplicated or even lost frames. Besides being ugly, the artifacts also harm coding efficiency: You will get worse quality per unit bitrate.

11.1.1.1. Identifying source framerate

Here is a list of common types of source material, where you are likely to find them, and their properties:

  • Standard Film: Produced for theatrical display at 24fps.

  • PAL video: Recorded with a PAL video camera at 50 fields per second. A field consists of just the odd- or even-numbered lines of a frame. Television was designed to refresh these in alternation as a cheap form of analog compression. The human eye supposedly compensates for this, but once you understand interlacing you will learn to see it on TV too and never enjoy TV again. Two fields do not make a complete frame, because they are captured 1/50 of a second apart in time, and thus they do not line up unless there is no motion.

  • NTSC Video: Recorded with an NTSC video camera at 60000/1001 fields per second, or 60 fields per second in the pre-color era. Otherwise similar to PAL.

  • Animation: Usually drawn at 24fps, but also comes in mixed-framerate varieties.

  • Computer Graphics (CG): Can be any framerate, but some are more common than others; 24 and 30 frames per second are typical for NTSC, and 25fps is typical for PAL.

  • Old Film: Various lower framerates.

11.1.1.2. Identifying source material

Movies consisting of frames are referred to as progressive, while those consisting of independent fields are called either interlaced or video - though this latter term is ambiguous.

To further complicate matters, some movies will be a mix of several of the above.

The most important distinction to make between all of these formats is that some are frame-based, while others are field-based. Whenever a movie is prepared for display on television (including DVD), it is converted to a field-based format. The various methods by which this can be done are collectively referred to as "telecine", of which the infamous NTSC "3:2 pulldown" is one variety. Unless the original material was also field-based (and the same fieldrate), you are getting the movie in a format other than the original.

There are several common types of pulldown:

  • PAL 2:2 pulldown: The nicest of them all. Each frame is shown for the duration of two fields, by extracting the even and odd lines and showing them in alternation. If the original materi