Can someone explain some different bit depth stuff to me?

So I understand what the values mean, as in the usual color depth values.

8bit = 8bit per channel = 256 values per channel = 256^3 = about 16.77 million colors.
16bit (actually only 15bit in Photoshop)
24bit
32bit. (HDR BABEH!)

But what is a 16 bit flaot and how does it compare to just 16 bit, likewise an 8, 24, or 32bit float. What does the float mean and can all of the color depths be floats?


TLDR: Float indicates that the number is stored as a decimal number instead of as an integer (whole number). This allows for the color (or any number that the float represents) to have a much greater total range, by sacrificing absolute precision.

Here’s my best attempt at explaining:

normally, colors are stored as a combination of integer values. so when we talk about 8 bit we are talking about 256 different possible integer values, that’s 0 - 255 for each channel. Integer values are stored directly and precisely by the computer.

floating point numbers on the other hand are not stored precisely, they are instead stored in an exponential notation, like scientific notation. Floats are a way to store very very large or very very small numbers (or at least numbers with very very small components, like 1.00000000000000001).

here’s a little bit of an explaination of how a floating point number is stored on a computer:

They are decomposed into:

  • sign s (denoting whether it’s positive or negative) - 1 bit
  • mantissa m (essentially the digits of your number - 24 bits
  • exponent e - 7 bits

Then, you can write any number x as s * m * 2^e where ^ denotes exponentiation.

which would look something like this in binary

0 10000001 01001100110011001100110
S E_______ M______________________

the above info was pulled from http://stackoverflow.com/questions/6910115/how-to-represent-float-number-in-memory-in-c

If that goes over your head, you probably need to read a bit about how binary numbers work. this kahn academy video does a great job explaining the basics of binary numbers

here’s another good article explaining the limitations of floating point numbers

hope this helps, let me know if you have any more questions :slight_smile:

Basically (simplifying a lot):
X bit -> From 0 to 2^X, only integer values. In images, negative values are not allowed.
X bit float -> Different range of numbers, allows negatives, allows decimals. Loses precision when representing big numbers.

For the kinky details: https://en.wikipedia.org/wiki/Floating_point