Verification by Types
Why Should Spreadsheets Be Verified by Tools?
Spreadsheet correctness should be taken seriously, and with the help of automatic tools. There are several reasons:
- Spreadsheets are error-prone
- Spreadsheets are carriers of important business information, data and logic
- Spreadsheets can be big and complex, or made by others; they are hard to review thoroughly.
Verification by Types
Type checking is a classic, popular, and effective method to verify computer programs. It is even systematically applied to programs written in certain languages such as OCaml, C#, etc., which are thus called "strongly typed" languages. However, almost all the spreadsheet languages are "weakly typed". For example, a comparison of a string and a numeric value is permitted (returns True
or False
), and does not lead to any error. Even though some meaningless operations show errors such as #VALUE
, they are not blocking and can be easily disregarded by users. Therefore, our approach aims at applying type checking to spreadsheet programs, to improve their quality.
Types
In our current type system, we class cell values by the following types:
Type | Description |
---|---|
Empty | an empty cell |
Double | e.g., 1,2 , 120% , 1,25236E+10 |
Integer | e.g., 0 , -1 , 100 |
Boolean | the type of TRUE and FALSE |
String | e.g., abc , "abc" ,"123" , (a space), "" (two double primes) |
Date | e.g., 11/08/2019 , 14 mars 2012 , 14/3/12 1:30 PM |
Currency | e.g., $1 234,10 , -€1 234,10 |
Any | a global type to represent any value and type |
Dangerous Typing Rules
We implement a set of dangerous typing rules in Spreadsheet Verificator. The tool verifies all the formulas of a spreadsheet, raises a typing error, if it discovers a dangerous typing rule may be matched during the calculation of the spreadsheet. In the following table, we list a part of our dangerous typing rules. For instance, Boolean + | - Currency | Date
means it is dangerous to add or minus a Boolean value to a Currency or a Date. Note that some typing rules may be considered dangerous by some users, but safe by others. It is not possible to make a universally absolute and exhaustive set of dangerous typing rules. If you think certain typing rules are missing in the tool, please contact us.
Rules - Elementary arithmetic |
---|
-(Boolean | Date) |
+Boolean |
Currency + | - | * | / Date |
Date + | - | * | / Currency |
Currency | Date + | - Boolean |
Boolean + | - Currency | Date |
Boolean | String + | - | * | / Any |
Any + | - | * | / Boolean | String |
Date * | / Any |
Any * | / Date |
Rules - Relational operation |
---|
Date < | > | <= | >= | == | <> String | Integer | Double | Boolean | Currency |
Currency < | > | <= | >= | == | <> String | Boolean | Date |
String < | > | <= | >= | == | <> Boolean | Integer | Double | Date | Currency |
Boolean < | > | <= | >= | == | <> String | Integer | Double | Date | Currency |
Integer | Double < | > | <= | >= | == | <> String | Boolean | Date |
Rules - Arithmetic |
---|
SIN | COS | ATAN | ASIN (Boolean | Date | String) |
LN | LOG1 | LOG10 | SQRT | EXP (Boolean | Date) |
POWER | LOG2 (Any, Boolean | Date) |
POWER | LOG2 (Boolean | Date, Any) |
Rules - MIN, MAX |
---|
MIN | MAX (Boolean | String) |
MIN | MAX (..., Boolean | String, ...) |
MIN | MAX (..., Currency | Double | Integer, ..., Date, ...) |
MIN | MAX ({...; Boolean | String; ...}) |
MIN | MAX ({...; Currency | Double | Integer; ...; Date; ...}) |
Rules - SUM, AVERAGE, MEDIAN |
---|
SUM | AVERAGE | MEDIAN (Boolean | Date | String) |
SUM | AVERAGE | MEDIAN ({...; Boolean | Date; ...}) |
SUM | AVERAGE | MEDIAN (..., Boolean | Date | String, ...) |
Advantages
There are other existing tools and methods to verify spreadsheets. Our Spreadsheet Verificator has several advantages:
- Spreadsheet Verificator is fully automatic; it does not require users to provide supplementary information to be launched.
- It is sound. That means we guarantee that none of the dangerous typing rules could ever be matched in any calculation of the spreadsheet. In other words, if the spreadsheet is validated by the tool, the validation applies to all the possible input cell values, as long as their type does not change. This is particularly better and more exhaustive than a manually testing approach, which cannot cover all the possible input instances.
False Alarms
As the cell property that we reason on (i.e., types) is more abstract and less precise than concrete cell values, verification tools like ours involves a notion of precision. In case of being not precise enough, there may be false alarms, which means what the tool raises is not a real error. For instance, the formula =IF(ABS(A1)>=0, 3, false)+5
always evaluates to 8
. Whereas, an analysis solely based on types translates the formula to =IF(ABS(A1)>=Integer, Integer, Boolean)+Integer
. As it cannot evaluate the value of ABS(A1)>=Integer
, it considers the typing error Boolean+Integer
is possible. Therefore, one direction to improve verification tools is to improve their precision and make less false alarms.