Fundamentals of Spreadsheet Languages
To many people, it may sound like an exaggeration to say that Microsoft Excel is a programming language. One reason is that Excel has so many functionalities that one can just use it like a software to accomplish tasks. Another reason, we believe, is that not many people don't know functional programming languages, to which spreadsheets can relate.
But when you say "Excel is a functional programming language" to a computer scientist, they will tell you that it lacks important features like higher-order functions, which are supported by normal functional programming languages. Then the interesting question is how spreadsheet programs became so popular and even powerful in many situations, regardless of these drawbacks?
In this article, we examine spreadsheets by various concepts of programming languages, and explain how spreadsheet languages can be seen as a functional programming language. By comparing with other languages, we break down what language features spreadsheets have, what they lack, and how they catch up on this. Additionally, we use examples to illustrate particular programming patterns that spreadsheets are good at.
1 Spreadsheet languages are a functional programming language
1.1 Multiple paradigms
We take Microsoft Excel & VBA (Visual Basic for Applications) as an example of spreadsheets. It can be seen from several different perspectives:
- People may just store data in a spreadsheet, then it is just a small database.
- People may just store texts in a spreadsheet, then it is like a text document.
- We can use menus and buttons to manipulate data in spreadsheets; we can manually enter, modify, copy-paste data. In this case spreadsheets are a software.
- VBA, which is behind Excel to extent its functionalities, is mainly an imperative and object-oriented programming language.
All these features contribute to the popularity of spreadsheets. But, in this article we want to accentuate the essential of spreadsheets, which makes spreadsheets special, and distinguishes it from a database/document/software/traditional imperative languages.
1.2 Functions and formulas as elementary components
1.2.1 Elementary components of spreadsheets
Spreadsheet functions and formulas are particularly fascinating to many people. They find functions and formulas powerful, pure and elegant. They enjoy building formulas with functions and building spreadsheets with formulas, which is like playing a Lego game with primitive blocks (take myself as an example, when I started to use Excel, I was obsessed by constructing long array formulas and wondered how powerful their combination could be).
Today, we can step back and look at functions and formulas from a programming perspective:
- Functions and formulas are elementary components that allow users to automate repetitive tasks and calculations (in comparison with manual triggering of actions by menus and buttons)
- Functions constitute a formula, chains of formulas constitute a spreadsheet program.
- An execution of a spreadsheet program consists in evaluating the formulas in the chains one by one.
1.2.2 Facilities of programming in formulas and functions
Spreadsheet software have many facilities to help building formulas with functions (thus make it easy to write spreadsheet programs). Because of the interactive and visual feature of spreadsheets, these mechanisms may not exist in other programming languages and environments. We can list some:
Copy-pasting a formula to a range of cells is very handy in spreadsheets. Parameter references will automatically change and adjust according to the cell position. This copy-pasting practice is certainly not common or encouraged in other programming languages.
When we determine parameters of a formula, we can see values of spreadsheet cells (thus to say if they are useful or not), manually navigate through available cells, and select them as parameters.
Instant evaluation: in most of programming languages, programming environment (e.g., inside an editor) is separate from the execution environment (e.g., command lines of a terminal). By contrast, in spreadsheets, the programming environment and the evaluation environment is the same, i.e., the spreadsheet. Additionally, the evaluation of a spreadsheet can be automatic and is often fast. All of this gives an instant experience of the evaluation of a spreadsheet.
The interface of a spreadsheet allows users to see many cells at a glance. A formula is hidden behind the value of each cell; regardless of the complexity of the formula, it occupies only the space of one cell (though users can select a cell and see its formula). As a result, spreadsheets can hold and arrange many many formulas without disturbing users. If we consider each cell of a spreadsheet as a variable of a program, spreadsheets easily display many variables in a screen, which is not achievable by many other languages.
1.3 Concepts of functional programming languages reflected in spreadsheets
Then, we can see how classic concepts and advantages of functional programming are reflected and respected in spreadsheets:
Clear, consistent and auto-adjusted dependency: in a cell formula, the input parameters (i.e., cells and values) are very clear; the output is the cell itself. Moreover, spreadsheets try to protect well dependencies and keep them consistent. For instance, when users change the spreadsheet structure by inserting/deleting rows/columns or copy-pasting formulas, parameter references in formulas will be automatically adjusted such that cells are still correctly referenced. By contrast, the dependency of spreadsheet cells and VBA procedures is well not "protected": cell references in VBA procedures are not smartly auto-adjusted after a structure change of spreadsheets.
No side effects: the formula evaluation of a cell will only have impact to the cell itself and the cells depending on it. By contrast, we can code anything in a procedure in VBA, which may result in side effects such as modifying unrelated cells.
Immutability: once all the cell formulas are set, the evaluation of the spreadsheet will not be able to change any cell formula.
1.4 Other characteristics
One characteristic of spreadsheets is that the data structure they work on is especially two-dimensional arrays. The layout is organized in two dimensions such that data is well displayed; Many built-in functions are available to take several arrays as parameter. All of this permits of a handy operation and manipulation on this particular data structure.
Another characteristic of spreadsheets is that input data, intermediate data, output data are all mixed together in sheets. This makes it easy to re-use data, but it may result in messy spreadsheets if the data is not very organized.
2 Spreadsheet mechanisms to catch up on function defining
One big criticism of spreadsheet languages is the lack of the ability to define re-usable functions. Whereas, in most of programming languages, we can define a function and repeatedly call the function by its name. Here is a very basic example in Python:
def area(a, b): return a * b
print(area(1, 2)) # output: 2
print(area(3, 4)) # output: 12
print(area(5, 6)) # output: 30
Then, what are the mechanisms in spreadsheets to achieve the similar goal?
2.1 Making and holding many similar calculation bodies (i.e., spreadsheet formulas) is easy in spreadsheets, e.g., by copy-paste
Although spreadsheets do not permit of defining a re-usable function, we can enumerate the application of the same function to different input values. To do so, we can "brutally" repeat a similar calculation body (i.e., a spreadsheet formula) with slightly different arguments in many cells. Surprisingly, it is not inconvenient, for the following two reasons:
Making similar spreadsheet formulas with slightly different arguments is very easy by copy-paste. We can either use keyboard shortcuts like
Ctrl+C
andCtrl+V
, or by mouse drag and drop; spreadsheets will smartly adjust argument references while keeping the same formula structure.As stated above, spreadsheets can hold and arrange many many formulas without disturbing users; Many formulas having a same structure stored in a range do not bother users.
As an example, we can quickly make the following spreadsheet to calculate the area of many rectangles,
though no function like the previous area
in Python is defined.
2.2 Re-using existing cell values is very convenient in spreadsheets
Although users cannot define a function to re-use in spreadsheets, users can well re-use the result of a piece of code (i.e., the value of the ending cell of a formula chain). Because spreadsheets display well many cells at a glance and permit of a smooth navigation among cells, users can easily identify useful cell values and to write formulas referring to them. It is such handy in spreadsheets that users might over-use it and create complex relationship among spreadsheet cells.
2.3 Instant evaluation of a spreadsheet is helpful
In some cases, the instant evaluation may remedy the lack of the ability of defining functions. Sometimes the purpose of re-using a function is to compare outputs of different inputs: we give a set of inputs to a function and get a set of outputs. Spreadsheet users may have another way to undertake this experimentation: rather than putting side by side outputs from different inputs, they try different input values and can quickly see the output thanks to the seamless experience of the evaluation.
{todo: add references of Simon Python Jones and Sestoft, with a special format}
3 Lack of higher-order functions
When you tell a computer scientist that "spreadsheet is a functional programming language", their first reaction would be "does spreadsheet support higher-order functions?". Well, the answer is "No".
A higher-order function is a function that accepts other functions as parameters and/or use a function as the return value. A part of programming languages such as Pascal and C don't support higher-order functions; a part of programming languages (especially functional programming languages) such as OCaml have higher-order functions. Python permits of higher-order functions, here is an example:
def square(x): return x * x
def cube(x): return x * x * x
def strange(f, x): return f(x) + f(x * 2)
print(strange(square, 1)) # output: 5 (which is 1*1+2*2)
print(strange(cube, 1)) # output: 9 (which is 1*1*1+2*2*2)
By contrast, in spreadsheets, we cannot define such a powerful strange
;
we have to respectively define one column to enumerate strange(square, ...)
and another column to enumerate strange(cube, ...)
.
4 Favorite patterns of spreadsheets
Regardless of lack of mechanisms such as function defining and higher-order functions, spreadsheets are good at some particular programming patterns, especially over arrays. That's why it is very handy to achieve many tasks (which need to be undertaken by higher-order functions in other languages). We list some of these patterns in this section.
4.1 Mapping over arrays
One strength of spreadsheet programming is to iterate over arrays, especially to simultaneously map over several arrays.
It is equivalent to map
, map2
, etc. in other languages. For example, formulas in Column D simultaneously map over 3 arrays in the following spreadsheets:
Mapping over several arrays may not be primitive in some programming languages,
we can implement it by ourselves. For example, we implement map3
as follows in OCaml:
# let map3 f a0 a1 a2 =
let l = Array.length a0 in
let e = f (Array.get a0 0) (Array.get a1 0) (Array.get a2 0) in
let result = Array.make l e in
for i = 1 to l - 1 do
let e = f (Array.get a0 i) (Array.get a1 i) (Array.get a2 i) in
Array.set result i e;
done;
result;;
val map3 :
('a -> 'b -> 'c -> 'd) -> 'a array -> 'b array -> 'c array -> 'd array =
<fun>
# let a0 = [|0; 0; 0; 0; 0|]
and a1 = [|1; 2; 3; 4; 5|]
and a2 = [|2; 3; 4; 5; 6|]
and f e0 e1 e2 = e0 + e1 + e2;;
# map3 f a0 a1 a2;;
- : int array = [|3; 5; 7; 9; 11|]
In Python, the map()
function is quite flexible, and can task several lists as parameters.
l0 = [0,0,0,0,0,0]
l1 = [1,2,3,4,5,6]
l2 = [2,3,4,5,6,7]
result = list(map(lambda x, y, z: x+y+z, l0, l1, l2))
print(result0) # output: [3, 5, 7, 9, 11, 13]
4.2 Fold over arrays
Fold (also termed reduce) refers to a function that takes a combining operation, recursively applies to a data structure, and builds a return value. A strength of spreadsheets is to fold over several arrays. Additionally, it shows a final return value, as well as all the intermediate values. For example, calculating bank statements with credits and debits turns out to be a fold operation:
We can implement this fold over 2 lists in OCaml as follows:
# let myFold2_left f init a0 a1 =
let e = f init (Array.get a0 0) (Array.get a1 0) in
let result = Array.make (Array.length a0) e in
for i = 1 to (Array.length a0) - 1 do
let e = f (Array.get result (i-1)) (Array.get a0 i) (Array.get a1 i) in
Array.set result i e;
done;
result;;
val myFold2_left :
('a -> 'b -> 'c -> 'a) -> 'a -> 'b array -> 'c array -> 'a array = <fun>
# let init = 100
and a0 = [|6;6;6;6;6|]
and a1 = [|1;1;1;1;1|]
and f x e0 e1 = x + e0 - e1;;
# myFold2_left f init a0 a1;;
- : int array = [|105; 110; 115; 120; 125|]
In Python, we can combine reduce()
and zip()
to achieve this:
from functools import reduce
init = 100
a0 = [6,6,6,6,6]
a1 = [1,1,1,1,1]
def f(l, cur):
r = l[-1] + cur[0] - cur[1]
l.append(r)
return l
print(reduce(f, zip(a0, a1), [init])) # output: [100, 105, 110, 115, 120, 125]
4.3 Other array operations like Filter, Sort and Unique
In previous versions of Excel, it was complicated to undertake array operations such as filter, sort and unique;
one way was to use primitive functions such as INDEX
, MATCH
and COUNTIF
to construct Ctrl+Shift+Enter (CSE) array formulas.
As a consequence, array formulas are long, slow, and hard to understand.
An exciting news is that, Microsoft Excel gradually introduces a set of new functions called dynamic array functions including FILTER
, SORT
and UNIQUE
.
These operations (and their combination) over multiple arrays then become very easy in Excel.
The follows is an example where we find unique elements from a list and sort them.
We can either achieve this by chaining two steps and two columns, or by one combined formula with one column:
In Python, there are the sort()
and sorted()
functions to sort a list;
there is no built-in function to achieve "unique", though there are many ways to construct it.
list = [3, 2, 1, 1, 2, 3, 5, 7, 8, 6]
unique = [x for i, x in enumerate(list) if i == list.index(x)]
print(unique) # output: [3, 2, 1, 5, 7, 8, 6]
print(sorted(unique)) # output: [1, 2, 3, 5, 6, 7, 8]
5 Conclusion
We show that core concepts of functional programming languages can be found in spreadsheet languages. Moreover, functional paradigm (e.g., consistent and auto-adjusted dependency) is well incorporated and protected by spreadsheet software like Microsoft Excel; respecting functional paradigm during spreadsheet development has many advantages including reducing possible spreadsheet errors. Therefore, we categorize spreadsheet languages as a functional programming language.
As illustrated by this article, the approach of seeing Excel as a programming language and spreadsheet files as programs is fundamental. Now, we could explain behaviors of Excel in vocabularies of computer science. Going further, we could improve the use and development of spreadsheets by inspiring from existing concepts, works and tools in other programming languages. This perspective and approach is the inspiration and foundation of our 10 Studio toolset.