Video Processing and Colorspaces

(What is really Lossless?)

Video monitors and televisions use red, green, and blue mixed together to create all of the colors that they display. You may say, “Wait a minute, I thought red, yellow, and blue were the primary colors?” That is out of the scope of this document so see Wikipedia on Primary Colors first.

RGB Colorspace – The Easy Stuff
Not understanding video colorspaces has caused me to lose days worth of effort because all of my video processing was for naught. We’ll start from the screen format of red, green, and blue (RGB) and work into the other ones. Each color is called a channel. So RGB has three color channels. The easiest way that this is represented is that in each pixel, the red, green, and blue are each represented by one byte. In this format, each byte is a numerical representation of the “brightness” of that color. If all the bytes are zero, then you have black. If they are all at their maximum, you have white. If red and blue are non-zero but green is zero, you have purple and so on. Since each byte is 8 bits, each color has 8 bits in one pixel. This format would be called RGB24. A related format called RGB32 is the same 8 bits per pixel, but discards the last byte as extra data. It is easier for computers to deal with 4 bytes at a time than 3, so it is useful to have extra data that is discarded. RGBA32 is a slightly different format where instead of discarding the last byte, the last byte is used for a transparency (or “alpha”) channel (remember a channel is just a way to talk about colors individually), that specifies how transparent the pixel is. This is useful when laying one image on top of another. You could have a purple pixel with an alpha value that makes it halfway transparent so when this pixel is “overlaid” on another image, the purple will only partially mix with the pixel underneath, and you will still see the pixel underneath. This technique is used for overlaying images such as for displaying television channel logos over a television show.

YUV Colorspace – What is That?
Ok, now to the complicated part. The problem is that I always understood RGB24, etc, and I thought that all video codecs and outputs used this style format. This caused me a lot of problems with video processing as I found that many video formats and codecs use different colorspaces that I had never heard of. Every time I processed a video, I ended up converting it to a differenct colorspace and losing image data or causing inconsistencies or extra data. This was very frustrating as I didn’t understand what was happening or what was causing it to happen. Another common video format is called YUV. This is COMPLETELY DIFFERENT than RGB. Somebody much smarter than I am learned a better way to compress and display video information. The YUV format uses Y as a brightness channel, and then U and V as color channels. The U channel is is a color channel that is a bluish-purplish-greenish and the V channel is a greenish-reddish. You can see examples of these at Wikipedia. This is how analog televisions worked. Black and white televisions only had a Y (brightness) channel. When color television was developed, U and V channels were added as additional signals so that black and white televisions and color televisions could use the same broadcast signals. Black and white televisions discarded the U and V channels, and only displayed the Y channel. This is why on color TV’s, when sometimes the signal was poor, the channel would come in but the color would go in and out. YUV is now a more popular format for storing video and images than RGB is.

YUV and Compression and Visual Tricks
Developers of video and formats and colorspaces have learned that the human eye is more sensitive to changes in brightness than it is to changes in color. So when developers were trying to figure out new ways to compress video, they learned that they could throw away 75% of the color information (U and V channels) in a picture or video, while keeping the original brightness (Y channel) with little perceived quality loss in the picture or video. This was great, they saved a lot of space, and still had good-looking media. But when you are processing and re-processing video, these conversions can be a problem. Let’s take the example of a recorded screencast of something you have done on your desktop. Most likely, this is recorded in RGB24 or RGB32 format, since that is what your display is using. If you start encoding and re-processing willy-nilly with any old video codec, you could inadvertently convert this to a YV12 colorspace (a particular YUV colorspace see www.fourcc.org) and lose 75% of your color information! Now when you are making your final output video, this may be what you want. But during the re-processing stage, this is not a good idea.

Put Colorspaces to Use
Use the Internet, it is your friend. When you have a video that you want to process, learn what colorspace the original is using. And you can’t just look to see if it is RGB or YUV. The individual formats of the colorspaces do matter. You should have a look at www.fourcc.org for details on some of these formats. It will tell you how much information is kept and what may be discarded. When it says a “2×2 subsampled plane”, it means that color channel compresses 4 pixels worth of information (a square 2×2 block) into a single pixel. For your purposes in processing video, you will probably treat “color channels” and “color planes” the same so don’t worry about the terminology. Some formats may throw away only half of the color information, some may throw away 3/4 of it. If you truly want to keep your video in lossless format, make sure that your target colorspace in your target codec is the same as the colorspace used in the codec from your source video. Some lossless formats are only available in one or the other colorspace, and this causes losses as the conversion is performed back and forth.

Video Display and Outputting
Watch out that your video software does not decieve you. Sometimes, even though your display uses RGB to display pictures and video, your video playback software does not. Some video playback software (especially on a linux-type system), uses YUV as an intermediate format before it gets to the display and is converted back to RGB. So be careful that what you see is not always what you get.

Mencoder for Processing
I use Mencoder for a lot of my video processing, but many of you won’t so this may be of no use to you. Mencoder is command-line only, although there are some GUI’s around in the “Related Projects” section of their website. For lossless, I use the -ovc lavc -lavcopts vcodec=ffv1. If I need RGB lossless, I use -ovc lavc -lavcopts vcodec=ffv1:format=BGR32. If you are previewing with MPlayer, use -vo gl2 for viewing RGB formats and -vo xv for YUV formats. I do know that the Camstudio Lossless format is a good lossless screencasting format, with good compressing, for RGB data. When viewing with MPlayer, use -vo gl2 because -vo sdl will make this format look funny on the screen. Apparently there is confusion with the SDL output on whether this is RGB24 or RGBA32 so it counts the data wrong and screws up the rows.

Programming and Color Planes
If your goal is to use this information in programming, spend a LOT of time at www.fourcc.org and Wikipedia. In programming, you must deal with color planes, not just channels. With RGB32, you may have a 4-byte block of data, with the first three bytes being red, green, and blue, and the fourth discarded. OR, you may have separate color planes. Sometimes it is easier to handle video or picture data in different planes. This means that instead of having a byte of red, a byte of green, then a byte of blue, you may have all the red bytes together, then all the green bytes together, then all of the blue bytes together. This is normally not really used for RGB, but commonly for YUV formats. In your programming, you may get a pointer each plane, and have to do address lookups manually for each pixel. Sometimes the planes have special alignment issues, so it is important to look for the length of each line of pixels as there may be extra padding at the end of each line. Your programming doucmentation should tell you how to get the length of each line. Then you have to multiply this by the row number for the pixel you want to access. Remember in U and V planes, sometimes one piece of color information is used for 2 or 4 pixels. If your U and V planes are 2×2 subsampled, then you must divide the row number by two (discarding any remainder) AND your column number as well. It also means that if you want to add data to a single pixel, you have to “mix” it with the 2×2 subsampled data. If you are only changing the value of a single pixel, you must find the correct place in the buffer, and mix your “single pixel” color information in a 25% ratio to the “multi-pixel” color information already there as the other 75%. Have fun with the programming for that!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *