DataTricks : Validation Script

Unit test script. Timings were generated on a 2009 13” MBP laptop (2.2GHz) under Windows 7.

index_1.gif

Range Tests

Cell ranges should be trimmed to the 'used' extent of the spreadsheet:

index_2.gif

index_3.gif

When a single range is specified directly (not as a list of lists), the range should not be given in a list:

index_4.gif

The workbook should actually be evaluated (the saved cell values are not simply retrieved verbatim) when it is loaded. We can check this by retrieving ranges with random numbers multiple times. Deterministic columns (like Variable 1) should return the same values, but columns with random number formulas (like Variable 2) should be different with probability 1:

index_5.gif

Should correctly read in a table range (which has the advantage of being a dynamic named range, so the dataset is always appropriately sized):

index_6.gif

Data Type Tests

Check in-band error indicators are properly handled, and that values in the output have the right data types.

index_7.gif

Dates should be presented as date Lists :

index_8.gif

Even if decimal places are hidden in the spreadsheet view (as for Variable 3a), we should still retrieve the values to full precision:

index_9.gif

Cells hidden by merging should show as 'Null':

index_10.gif

Time Series Tests

Now we deal with time series observations, which are common in Economics. Data must be arranged in columns, and by default the reference dates are found in column A:

index_11.gif

The retrieved data series must be in the correct format for DateListPlot, which is smart enough to drop non-numeric data.

index_12.gif

index_13.gif

Multiple time series at once, to produce a rather silly graph:

index_14.gif

index_15.gif

With a non-standard date column. Note the user forgot the dropHeaderRows option, but they were dropped automatically because the first four rows were not valid dates.

index_16.gif

index_17.gif

Data Refresh Tests

Variable 7 is saved pointing to an old version of the data. In the original, there is only about a year of data; in the updated version of the source data sheet there are much more data. So we should load different datasets, depending on whether we refresh or not.

index_18.gif

Speed Tests on Typical Datasets

This dataset was designed to represent a spreadsheet of the size typically used to create graphs in Economics: 7 monthly series, each 18 years in length. Some series have a variety of errors and missing values, which are handled gracefully.

Import the whole sheet:

index_19.gif

index_20.gif

A subset of columns at a time:

index_21.gif

index_22.gif

Now, load an entire 5MB XLSX file.

index_23.gif

index_24.gif

And a subset of columns from this file :

index_25.gif

index_26.gif

Speed Tests on Large Datasets

This workbook contains a single, large dataset, 500 columns wide, with 10 header rows and 10000 data rows.
First, the 34MB XLSB file (note the built-in Import function does not even support XLSB files).

index_27.gif

index_28.gif

Now, the same tests on the 61MB XLSM file. XLSX files are larger and more complex, and so take longer to parse:

index_29.gif

index_30.gif

Now, how long to import all 5 million cells in the giant sheet?

index_31.gif

index_32.gif

Same again but for the 61MB XLSM sheet:

index_33.gif

index_34.gif

Invalid Input

Returns $Failed and outputs error messages when the user requests things wrongly. It should be pretty obvious what went wrong, but note that we don’t suppress exceptions raised by Excel here, in case the errors are actually due to something else we didn’t fully anticipate.

index_35.gif

index_36.gif

index_37.gif

index_38.gif

index_39.gif

index_40.gif

index_41.gif

index_42.gif

index_43.gif

Attempting to import a file that does not exist :

index_44.gif

index_45.gif

index_46.gif

Or if the sheet doesn' t exist.

index_47.gif

index_48.gif

index_49.gif

index_50.gif

index_51.gif

index_52.gif

index_53.gif

Invalid range:

index_54.gif

index_55.gif

index_56.gif

index_57.gif

index_58.gif

index_59.gif

index_60.gif

index_61.gif

index_62.gif

index_63.gif

index_64.gif

Spikey Created with Wolfram Mathematica 9.0