Current issue

Vol.26 No.4

Vol.26 No.4

Volumes

© 1984-2017
British APL Association
All rights reserved.

Archive articles posted online on request: ask the archivist.

archive/23/3

Volume 23, No.3

Using the .Net DataSet Class with Dyalog 12

Adrian Smith (adrian@apl385.com)

Acknowledgements

This material was prepared initially for Paul Mansour’s wonderful moot in San Quirico, and was improved during the presentation by several of those present, notably Stefano. Thanks to everyone who mucked in to help.

Motivation

One of the wonders of .Net is just how much stuff you get for nothing. One of the little gifts it gives you is a perfectly usable relational database, which they would love to have you backend onto SQL Server, but which is quite adequate running completely stand-alone. I rather like playing with databases, so I spent half a day fooling with this, partly as a way to explore the Dyalog .Net interface. I quickly discovered that wrapping one of Dyalog’s new classes around it could add quite a bit of fun to the experience, but it also became obvious that handling anything more than a few thousand rows would be amazingly slow unless all the data types were directly mapped into simple APL arrays. As soon as you hit nulls or (worse) dates APL converts each item into a namespace reference and things slow exponentially.

So this takes on from my note on new tricks for old dogs[1], and shows how you might actually make use of this stuff in a real working system, at a very reasonable speed. Nervous readers should note that there is some C# code involved, but it is very low on intellectual content, and consists mostly of all the loops that we don’t want to see in APL.

Getting started with the DataSet class

I am taking it as read that anyone who has read this far knows roughly what a database is, so I will run quickly though how .Net handles the idea of tables and relationships. We will build the little EmpDept example (with random variations) that Oracle shipped back in the 70s.

The very first thing you need is a DataSet which is a wrapper for your DataTables and the Relationships between them:

 ⎕USING←'' 'System.Data,System.Data.DLL' 'Causeway,D:\tools\aplc\dbs.dll'
 pr←⎕NEW DataSet(⊂'Payroll')

There is very little a DataSet can do on its own, but if you have a quick look at its methods, you will see something rather useful:

 prps←{⍵.⎕NL ¯2}
 meths←{{⍵/⍨~'_'∊¨⍵}⍵.⎕NL ¯3}
 meths pr
  … GetXml … WriteXml …

When you are done, you can save your entire database to disk in a (remarkably) clean XML format which you can then rebuild later on. For small, self-contained projects, this seems a very good solution, as well as being completely free. By small, I mean anything up to a few hundred thousand rows.

Of course we need some tables, and the tables need some columns. The .Net SDK help file is really well written, and walks you through the process at a good pace, if you want to explore this more deeply. For now I just want to get something I can play with in APL, so here we go:

⍝ The Emp table will need some columns, with type information
⍝ Start by collecting the datatypes we will need (String is the default coltype)
 int32←System.Type.GetType∘⊂'System.Int32'
 double←System.Type.GetType∘⊂'System.Double'
 datetime←System.Type.GetType∘⊂'System.DateTime'
 timespan←System.Type.GetType∘⊂'System.TimeSpan'

 empid←et.Columns.Add∘⊂'Id'
 empdept←et.Columns.Add∘⊂'Dept'
 empname←et.Columns.Add∘⊂'Name'

 sal←et.Columns.Add'Salary'int32
 sink←et.Columns.Add'Annual Bonus'double'Salary / 10'

 join←et.Columns.Add'Joining Date'datetime
 service←et.Columns.Add'Length of Service'timespan

A couple of things to note in passing – getting a Type object in APL is strangely messy. There may be a better way of doing this, but I haven’t found it yet. Also note how bonus is set to salary/10, in the definition of the table. I don’t really see the use of this in a real system, but it is quite surprising that you can do simple arithmetic directly in the table definitions. Finally, under coding habits I suggest you file away the idea of always using a jot against methods where you have to enclose the ‘string’ argument. This is redundant here, but makes it visually obvious that you are not really passing an enclosed array – just flagging it for enclosure by the called function. Also it will allow you to use each on an argument array without swearing. Another good habit is to set up true and false as standard globals to make your APL code look a little more like the examples in the help files. Now we hit a small brick wall and get the first taste of some C# assistance:

⍝ We need this as APL cannot pass an array of DataColumns here?
 Db.SetPrimaryKey(et,1)    ⍝ Static call on the library

⍝ This has one primary key which cannot have duplicates
 empid.(AllowDBNull Unique)←false true
⍝ Just to show we can store custom properties (string type only for XML use)
 et.ExtendedProperties.Add'Owner' 'Adrian'

⍝ We must have a deptid
 empdept.AllowDBNull←false

In order to set the primary key, I need to pass an array of DataColumn objects. I don’t think there is any way for Dyalog to do this (please prove me wrong, someone) so I dived over to TextPad and wrote a trivial bit of C# to do the job, given the column index or indices:

// Much more comfortable just to pass column numbers when creating tables!
 public static void SetPrimaryKey(DataTable table, int col) 
 { SetPrimaryKey(table, new int[] {col}); }
 public static void SetPrimaryKey(DataTable table, int[] cols)
 { DataColumn[] keys = new DataColumn[cols.Length];
   for(int ct=0; ct<cols.Length; ct++)
     keys[ct] = table.Columns[cols[ct]-1];
   table.PrimaryKey = keys;
 }

This code is included in the source for dbs.dll, along with a lot of other stuff I was expecting to have to write. One day it should become redundant.

Of course the department table will just be more of the same:

 deptid←dp.Columns.Add∘⊂'DeptId'
 sink←dp.Columns.Add∘⊂'Department'
 deptid.(AllowDBNull Unique)←false true

 Db.SetPrimaryKey(dp,1)

Finally, we draw a mental line between the tables (a real line if you always sketch these before you start typing) to enforce the link between them:

⍝ Finally link them up to enforce the foreign-key relationship
 aff←ds.Relations.Add'Affiliation'deptid empdept ⍝ Parent->child
 aff.ChildKeyConstraint.DeleteRule←Rule.Cascade  ⍝ Default but say it anyway

This means that anything you type in the Dept column of the Emp table must match an existing entry in the DeptId column of the Dept table. (And if you delete a department, then all the employees conveniently evaporate. This might not be the best option in production code …)

As the final step, let’s just add a few random departments and employees:

 ⎕USING←'' 'System'

⍝ Depts must be loaded first
 dept←ds.Tables[1]
 nr←dept.NewRow
 nr[0]←⊂'OR' ⋄ nr[1]←⊂'Operational Research'
 dept.Rows.Add nr

 nr←dept.NewRow
 nr[0]←⊂'GD' ⋄ nr[1]←⊂'General Dogsbody'
 dept.Rows.Add nr

⍝ Now the employees
 emp←ds.Tables[0]

 nr←emp.NewRow
 nr[0]←⊂'JM'
 nr[1]←⊂'OR'
 nr[2]←⊂'Jonathan'
 nr[3]←24800
 nr[5]←DateTime.Parse∘⊂'23/3/1998'
 nr[6]←TimeSpan.FromDays 678
 emp.Rows.Add nr

 …

⍝ Now lets try the bulk insertion method
 emp.BeginLoadData

 sink←emp.LoadDataRow('MWG' 'OR' 'Marc' 18500 0)true
 sink←emp.LoadDataRow('GMS' 'GD' 'Gill' 13503 0)true
 sink←emp.LoadDataRow('RCS' 'GD' 'Richard')true    ⍝ Default salary
 sink←emp.LoadDataRow('WHO' 'GD')true              ⍝ Default name to DBNull

 emp.EndLoadData

Note that I bottled out on using the bulk method with DateTime and just allowed our last four employees to have null values here. Bulk insert is much better done with a bit more C# help, so we can walk on by and play with the limited data we have.

Playing with Selections

We can do a fair amount of processing in plain APL now, starting from the pr object (the DataSet instance we made at the beginning) and digging down layer by layer:

      pr
System.Data.DataSet
      pr.DataSetName
Payroll
      pr.Tables
System.Data.DataTableCollection
      ⌷pr.Tables
 Emp  Dept

Note the excellent monadic squad , which converts a collection into an array. (We APLers likes arrays, just like we Pirates always speaks in the present tense – sorry. On we go …)

      emp←↑⌷pr.Tables                  ⍝ Just the employee list
      ss←emp.Select∘⊂'salary>20000'    ⍝ What have we here?
      ss
 System.Data.DataRow  System.Data.DataRow 
      ⊃ss.ItemArray
 JM    OR  Jonathan  24800 2480  23/03/1998 00:00:00  678.00:00:00         
 ACDS  OR  Adrian    36800 3680  30/09/1974 00:00:00  142.21:21:18.9000000 
      +/4⊃¨ss.ItemArray
61600

The good news is that you can get your data back out again, selected and ordered. The bad news is that you get it back in exactly the way you didn’t want – as an array of DataRow objects. APL will run through these for you grabbing each row as a nested array, but note that each item in the DateTime column is actually a namespace reference so if we have a few thousand employees, this innocent expression is going to run up the fan on your laptop for quite a while. Several years in fact, for a not-unreasonable sample. A more pressing problem is that none of the things an APLer wants to do (like find the median salary) are easy to achieve with this model. Time to step back and have a little think.

What do we really want here?

Clearly, we wants arrays, and we wants ’em now. It would be really great if the selection could be an instance of a Dyalog class, with all the columns as properties. Something like:

sel←Select emp 'salary>20000'   ⍝ Get a selection
      sel.Salary
24800 36800
      sel.Salary[2]
36800
      sel.Salary[2]×←1.05
      sel.Salary
24800 38640

This way a column in a selection behaves much like an external variable – we can index it and use any reasonable selective specification to update it. The other things we need are a good strategy for replacing [null] with a chosen numeric value, and a good way to represent DateTime objects and TimeSpan objects:

      sel.ColumnNames
 Id  Dept  Name  Salary  Annual Bonus  Joining Date  Length of Service
      sel.JoiningDate
35877 27302
      sel.LengthofService
58579200 12345678.9

I chose to form the property names by compressing out blanks from the column names, and to represent the dates as the traditional ordinal values. TimeSpans are trickier, as these are 64-bit integers in .Net and my APL cannot handle these just yet. I chose decimal seconds as the least bad option. If we reselect the whole table and just grab everything, you will see my default null substitutions:

      sel←Select emp ⋄ sel.GetMatrix  ⍝ Grab the whole table (NB dates/nulls)
 ACDS  OR  Adrian     38640   3864   27302 12345678.9
 GMS   OR  Gill       13503   1350.3    ¯1        0
 JM    OR  Jonathan   24800   2480   35877 58579200
 MWG   OR  Marc       18500   1850      ¯1        0
 RCS   FD  Richard   ¯32768 ¯32768      ¯1        0
 WHO   OR  (null)    ¯32768 ¯32768      ¯1        0

… but we can change this with another handy property:

sel.NullFillers[4 5]←0          ⍝ Change how nulls appear
sel.GetMatrix                   ⍝ Grab the whole table (NB different nulls)
 ACDS  OR  Adrian    38640 3864   27302 12345678.9
 GMS   OR  Gill      13503 1350.3    ¯1        0
 JM    OR  Jonathan  24800 2480   35877 58579200
 MWG   OR  Marc      18500 1850      ¯1        0
 RCS   FD  Richard       0    0      ¯1        0
 WHO   OR  (null)        0    0      ¯1        0

Of course the [null] stuff is fully reversible – setting an element to the current filler for that column results in a [null] in the table. The only remaining issue to illustrate is the speedup we can get by farming some of the grunt-work out to C# – I made a simple clone of a typical ‘ticker’ datafeed to give the engine enough data to chew on:

     LoadTicker 1
     sel←Select tk  ⍝ To play with it
     sel.ColumnNames
 What  When  Value
      sel.Rows
40000
      +/sel.Value
200499.74

This is pretty well instantaneous after the brief delay while the table is populated for the first time. For a rough guide to the alternative, I offer the following chart:

Time to read N rows with a DateTime col using native Dyalog technology

If you compute the regession equation for 40,000 rows you will get 4.3e¯7 × 40000*2 giving 688s or a little over 11 minutes just to create the array of rows. Probably you will get a Syserror and your interpreter will exit stage left (pursued by a bear) long before the expression completes. For comparison:

Time to read N rows with a DateTime col using the Helper DLL

This one has the added benefit of being linear rather than quadratic – note the x-axis had to be extended by a factor of about 20 to get meaningful times. That will do for showing you around the examples – what we need to do now is to show how the trick is pulled.

Farming out the loops

Whatever you say about conventional languages, they are good at doing simple things quickly. Here is the code that sits at the back of the column reference:

 public Array GetCol(int cinx)   
 // Grab whole Column (ensure APL gets the right datatype)
 { int col = cinx-1;             // Fix for 1-origin
   Type coltype = APLTypes[col];
   Array coldata = Array.CreateInstance(coltype,Selection.Length);
   for(int row=0;row<coldata.Length;row++)
     coldata.SetValue(GetItem(row,col),row);
   return coldata;
 }

The full C# source code can be downloaded[2] if you want to read all the preamble. The DLL maintains an array of data-types which APL has for the various columns, and starts by creating an array of the right type and size. It then simply hammers round the elements in the column populating the array, which is returned to APL. This calls the lower-level function:

 private object GetItem(int row, int col)  
 // Return one cell, checking for type and null
 { object cellvalue = Selection[row][col];
   if(System.DBNull.Value == cellvalue)    // APL really hates these
     cellvalue = nullFillers[col];         // Plug in current filler instead
   else if(isDate[col])
     cellvalue = ((DateTime)cellvalue).ToOADate();
   else if(isSpan[col])
     cellvalue = ((TimeSpan)cellvalue).TotalSeconds;
   return cellvalue;
 }

After checking whether the data item is [null], it special-cases the two complex types that can upset APL and converts the cell content to something that can be returned as a simple numeric value. In the common case where APL can cope, this is just a fall-through which returns the right element from the selection. The code to update the selection when you assign something to the APL property is (if anything) even simpler:

 public void PutCol(int cinx, object[] newValues)  
 // Update data colwise (always an entire column)
 { int col = cinx-1;
   for(int row = 0;row<Selection.Length;row++)
     PutItem(row,col,newValues[row]);
 }

This assumes that the APL side returns an array of the right length, which is always the case if you update a property with indexing or selective specification. It just does the reverse conversions:

 private void PutItem(int row, int col, object newValue)  // Fix one cell
	{ Selection[row][col] = Dyalog2Data(col,newValue); }

 private object Dyalog2Data(int col, object dataitem)   
 // APL types --> Database types
 { Type coltype = APLTypes[col];

   object newValue = Convert.ChangeType(dataitem,coltype);

   if(NullFillers[col].Equals(newValue))
      newValue = System.DBNull.Value;
   else if(isDate[col])
      newValue = DateTime.FromOADate((double)newValue);
   else if (isSpan[col])
      newValue = TimeSpan.FromSeconds((double)newValue);

   return newValue;
 }

I think that shows you enough of the back-end processing to convince you that there really is no rocket science or magic going on. Now we need to hook up the Dyalog class mechanism to the other end of the process to make it all really developer-friendly.

Making a nameless class on the fly

It is a real benefit of working in a dynamic language that you can lay the floor as you walk across it. Old hands are used to the trick of generating functions on the fly (often locally), fixing them and running them. Now we can do the same with classes – by fixing an unnamed class from a localised script, we can return an instance of something which never existed in the workspace at all. This is what happens when you type:

 sel←Select emp 'Dept=''OR''' 'Salary'
      sel
Selection from Emp where Dept='OR' (5 rows selected)

        sel[3]
  ACDS  OR  Adrian  1000 100 27302 12345678.9

The display form of the class has been set to return a nice description of the selection, just to make it all look like the SQLPLUS command prompt I remember from a previous life. The Select function takes a reference to a table and optionally a selection expression (in SQL) and one or more column names to use as sort keys. It calls the C# code in the constructor for the db helper, passing across the table and the selections. Note that we can localise DB here:

 ⎕USING←'' 'Causeway,D:\tools\aplc\dbs.dll'    ⍝ Adjust to taste!

 DB←⎕NEW Db table
 colnames←DB.ColumnNames~¨' ' ⋄ writeable←DB.Writeable

 scr←':Class' ':Access Public' ''
 scr,←'∇ Create arg' ':Access Public' ':Implements Constructor'
 scr,←' DB←arg' ' ⎕DF DB.ToString' '∇ '

If I temporarily de-localise the script, we can see part of what it wrote for us (indexers are slightly special – we need a shape function so Dyalog knows when to call INDEX ERROR on us):

:Property Numbered Default DataRows
:Access Public
∇ value←shape
 value←DB.Rows
∇
∇ value←get arg
 value←⊂DB.GetRow arg.Indexers
∇ 
∇ set arg
 DB.PutRow arg.Indexers (↑arg.NewValue)
∇ 
:EndProperty

∇ value←GetMatrix
:Access Public
 value←DB.GetMatrix
∇ 

By setting an indexer as the default property, we can treat the entire selection as a simple vector, so fetching/updating it by row is trivial. There are several other pass-through calls like GetMatrix, and then we come to the only really dynamic part. Because the Select function has the list of (despaced) column names, it can easily make a property definition for every column:

 :For ct :In ⍳↑⍴colnames  
  ⍝ Expose all columns read-write unless they are marked readonly
   pname←ct⊃colnames
   scr,←''(':Property ',pname)':Access Public'
   :If ct⊃writeable
     scr,←'∇ set arg'(' DB.PutCol ',(⍕ct),' (arg.NewValue)')'∇ '
   :End
   scr,←'∇ value←get'(' value←DB.GetCol ',⍕ct)'∇ '
   scr,←⊂':EndProperty'
 :End

A typical (generated) entry looks like:

:Property Salary
:Access Public
∇ set arg
 DB.PutCol 4 (arg.NewValue)
∇ 
∇ value←get
 value←DB.GetCol 4
∇ 
:EndProperty

You may want to add some :Trap blocks, as you may get a crash in the C# on updating the data, and not want to get hung up in Dyalog’s generated setter function, for example:

      sel.Salary←1 2
EXCEPTION
set_Salary[1] DB.PutCol 4(arg.NewValue)
             ∧
      )si
[#.[[Unnamed]]] #.[Namespace].set_Salary[1]*

A little more validation on the APL side of the fence would do no harm here either! Having made the entire script, the final act is to make a transient class, and return an instance of it:

 scr,←'' ':EndClass'
 seln←⎕NEW(⎕FIX scr)DB

As I said before, both scr and DB can be locals, so this is remarkably tidy in its use of the workspace. As a final bit of fun, let’s whistle up a .Net DataGrid and point it at our table. If you download the workspace and try this out, you will see that as you type expressions in the APL session, the grid follows the changes automatically.

 ⎕USING∪←'System.Collections' 'System' 
 ⎕USING∪←⊂'System.Windows.Forms,System.Windows.Forms.dll'

 f←⎕NEW Form
 f.Text←'Hello, world'
 f.Visible←1

 emp←pr.Tables[0]
 ev←emp.DefaultView
 ev.AllowNew←false

 dg←⎕NEW DataGrid

 dg.(Top Left)←4 4
 dg.(Width Height)←f.(Width Height)-4 4
 dg.Anchor←AnchorStyles.(Bottom+Top+Left+Right)

 dg.SetDataBinding ev''       ⍝ Single table view
⍝ dg.SetDataBinding pr'Dept'  ⍝  or the drill-down variety

 f.Controls.Add dg

Display and editing in a .Net DataGrid
Display and editing in a .Net DataGrid

Or, of course, you can just use it as a handy editor for the data!

Wrap up

In summary, there is a very competent relational database sitting on your computer, and it is totally free. To save your work on the nearest hard-drive, all you need to do is:

pr.WriteXml 'C:\temp\payroll.xml' false  ⍝ Save DataSet (with Schema)
emp.WriteXml∘⊂'c:\temp\emp.xml' ⍝ Save table to disk (data only)

The XML you get is designed to be readable, and indeed editable, for example:

<Payroll>
  <Emp>
    <Id>JM</Id>
    <Dept>OR</Dept>
    <Name>Jonathan</Name>
    <Salary>24800</Salary>
    <Annual_x0020_Bonus>2480</Annual_x0020_Bonus>
    <Joining_x0020_Date>1998-03-23T00:00:00+00:00</Joining_x0020_Date>
    <Length_x0020_of_x0020_Service>P678D</Length_x0020_of_x0020_Service>
  </Emp>

If I make a trivial change (adding Marc’s middle initial) in Notepad, then start from cold:

      ⎕USING←'' 'System.Data,System.Data.DLL' 
      pr←⎕NEW DataSet(⊂'Payroll')

      pr
System.Data.DataSet
      pr.ReadXml∘⊂'C:\temp\payroll.xml'
ReadSchema
      ⌷pr.Tables
 Emp  Dept 
      emp←↑⌷pr.Tables  ⍝ Recover the employee list
      sel←Select emp   ⍝ Get a selection
      sel.GetMatrix
 ACDS  OR  Adrian     36800   3680   27302 12345678.9
 GMS   GD  Gill       13503   1350.3    ¯1        0  
 JM    OR  Jonathan   24800   2480   35877 58579200  
 MWG   OR  Marc W     18500   1850      ¯1        0  
 RCS   GD  Richard   ¯32768 ¯32768      ¯1        0  
 WHO   GD  (null)    ¯32768 ¯32768      ¯1        0  

With Dyalog classes, you can put a wrapper around it that makes columns in selections behave almost like variables in the workspace. With a little help from a C# DLL, it can be made to run quite fast enough to support a realistic application. You can download the Dyalog 12 (Unicode) workspace and supporting C# source code (compilation instructions included)[2] and use it as you wish. Please let me know of any improvements you make to either side.

References

  1. “New Tricks for Old Dogs: Making Sense of Classes and Namespaces”, Smith, A., Vector 23(1&2) p113-117
    http://www.vector.org.uk/archive/v231/smith.htm
  2. Source code for this article:
    http://www.vector.org.uk/archive/v233/smithdat/dbs.zip

Valid HTML 4.01 Strict

script began 6:36:22
caching off
debug mode off
cache time 3600 sec
indmtime not found in cache
cached index is fresh
recompiling index.xml
index compiled in 0.3026 secs
read index
read issues/index.xml
identified 26 volumes, 101 issues
array (
  'id' => '10011840',
)
regenerated static HTML
article source is 'HTML'
source file encoding is 'UTF-8'
URL: mailto:adrian@apl385.com => mailto:adrian@apl385.com
URL: #ref1 => art10011840#ref1
URL: smithdat/dbs.dll => trad/v233/smithdat/dbs.dll
URL: smithdat/native.png => trad/v233/smithdat/native.png
URL: smithdat/helper.png => trad/v233/smithdat/helper.png
URL: smithdat/dbs.zip => trad/v233/smithdat/dbs.zip
URL: #ref2 => art10011840#ref2
URL: smithdat/datagrid.png => trad/v233/smithdat/datagrid.png
URL: ref2 => trad/v233/ref2
URL: http://www.vector.org.uk/archive/v231/smith.htm => http://www.vector.org.uk/archive/v231/smith.htm
URL: http://www.vector.org.uk/archive/v233/smithdat/dbs.zip => http://www.vector.org.uk/archive/v233/smithdat/dbs.zip
URL: http://validator.w3.org/check?uri=referer => http://validator.w3.org/check?uri=referer
URL: http://www.w3.org/icons/valid-html401 => http://www.w3.org/Icons/valid-html401
completed in 0.33 secs