Friday, January 8, 2010

Representing Data in Programming Lang...

Representing Data in Programming Languages

How many bits do you need to represent the number 3 in a computer?

If you didn't get 0 then try this one. How many bits does it take to represent a number that might be 17 or 18? Now it is more obvious that the answer is 1 bit: 0 for 17 and 1 for 18 (or vice versa). Of course it might result in a shorter and/or faster program if the number is represented by 17 for 17 and 18 for 18, but that might not be the case.

This is closely related to the question of when you turn a program into a more specialized version. Specialization can be for a specific machine or class of machines, or it can be for a specific execution (or range of executions) of a program. At any rate the fact that you know that a particular expression has value 3 might be because it is a constant in the program, but it might also be because the program has started executing and on this occasion the value is definitely going to be 3.

It is worth noticing that the distinction between data and program is getting blurry. If the user specifies some complex data, like a spreadsheet with formulas, then the boundary between data and program is hard to pick. This is a natural complement to the more general trend for compilation/specialization to occur at various and multiple times.

The specific thing that aroused this thought was the duck-typing style of interfaces in the new language Go. If a procedure parameter is an interface value then it is assumed to have a complex form, with a pointer to the value and pointers to relevant methods. This will often be inefficient and unnecessary. It will often be better to just replicate the procedure for each possible value type, and pass in the value rather than a pointer to the complicated standard Go interface value. To link that back to the initial point: Values can often be represented tersely at the expense of having a more complicated compiled program.