Mod /
Tad
Search:  


1.  Text Array Database

TAD is a simple memory resident flat-file database using Tcl arrays (ie. UTF-8). Tad can load/save Tcl array data to/from disk, and even trickle write/append changes to disk. It also sports a simplified query interface using the expr command. Tad also provides a server mode for use via networked access, and clients, both command-line and GUI.

See TadTv for a case-study using TAD.

1.1  TAD vs SQL Queries

Following compares side-by-side SQL and TAD queries:

     ######################################    ##############################
     #                SQL                 #    #            TAD             #
     ######################################    ##############################
     SELECT * FROM tbl                         query tbl
     SELECT count(*) FROM tbl                  query tbl 1 -count 1
     SELECT * FROM tbl ORDER BY age            query tbl 1 -sortby age
     SELECT id,age FROM tbl WHERE $age<5       query tbl $age<5 -get {id age}
     SELECT * FROM tbl WHERE $age<1 AND $id>2  query tbl {$age<1 && $id>2}

Tad defines the query command, taking a table and arguments in name/value option pairs.

The proc signature for query is:

    proc query {table {expression 1} args}

The expression argument is iteratively evaluated by expr, for each row of the table. The options for args are:

   -command       {}     "Command to evaluate on each matching row" -cmd {key data} 
   -count         False  "Return count or llength of matches" 
   -check         False  "Perform runtime checks" 
   -delete        False  "Call [delete] for records matching query" 
   -fmt           {}     "Override the default result-format mode" -type bool 
   -freq          False  "Sort and count frequency of duplicates" 
   -get           {}     "Names of fields to return or '*' for all, '+' for key+all, '=' for name/value pairs" 
   -keys          {}     "Use only rows with the given keys" 
   -inline        False  "Query uses [lindex $_value N]"
   -index         {}     "Return the single nth value" 
   -lups          0      "Load fields with -lookup into (name.field)" -type {choice 0 1 2} 
   -match         ""     "Use only keys matching this pattern" 
   -max           -1     "Limit returned values, or 0 for titles" 
   -modify        {}     "Call [modify] with given name/value pairs" 
   -regexp        False  "Use [array match -regexp] (requires 8.5+)" 
   -sort          False  "Sort in order of returned field" 
   -sortby        {}     "Sort by specified fields" 
   -sopts         {}     "lsort options for -sort/-sortby,  eg. -decreasing -unique" 

Inputs and outputs in TAD are simply Tcl lists. For example, the field names to be return can be passed in as a list via -get. Data results are returned as a Tcl list of rows (except when -fmt 1 is used). Using list form ensures that data is always well formed.

Note that simple expr expressions are similar to SQL and can even be converted to SQL using something like:

set sql [string map { {$} {} && " AND " || " OR " != != ! " NOT "} $expr]

1.2  Programming Example


   package require tad
   namespace import ::tad::query

   tad table create mytbl -fields {name age} -

1.3  Interactive Example

Any tcl array (or list) may be used in a query.


tclsh tad.tcl
% query tcl_platform
=>
rowid     0            
--------- ------------ 
osVersion 2.4.18       
byteOrder littleEndian 
machine   i686         
platform  unix         
os        Linux        
user      marvin     
wordSize  4            

query tcl_platform {$rowid == "os" || $0 == "unix"}
=> 
#        0     
-------- ----- 
platform unix  
os       Linux 

Any user defined array or list of data may be queried, however it is often useful to assign field names to simplify queries. For arrays, the special element {} is used to store this info.

  table create clients -fields { first last address sex } -keyname phone -insert {
      601-5454332   { Bob Brown {32 Elm St} m }
      601-5452134   { Tom Smith {421 Echo Ave} m }
      402-5451243   { Mary White {16 Amber Lane} f }
  }

  query clients {$last == "Brown"};  # returns {601-5454332}
  query clients {$sex == "m"};  # returns {601-5454332 601-5452134}
  insert clients -key 603-2341345 -set {first Charlie last Brown}

1.4  Keyint

Tables by default are created, with string keys. One can however use the -keyint option to create tables with auto-incrementing integer keys.

table create sales -keyint 1 -fields { cust amt date }
insert sales -data {"Bob Brown" 99.01 }
=> 1
insert sales -data {"Tom Jones" 9.92 }
=> 2
modify sales 2 {amt 9.91}

1.5  Expressions

The expression argument in query and mquery is just passed to the the expr command. This supports a set of builtin math and string operations.

Commands

It is also possible for expressions to call Tcl commands with []. But when run in server mode arbitrary command calling like this would be a security hole. Thus TAD restricst this to certain safe builtin commands such as [string] and [lindex] which do can not write to variables. All of the following commands are considered safe builtins:

  string clock lindex llength lrange lsearch split "regexp -inline" nindex

The -nocalls option controls this behaviour:

    0  - Allow any call
    1  - Disallow all but safe builtin calls (the default).
   -1  - Disallows everything, even builtins.

When checking is on and an expresion contains an open square brace, it is checked for calls. The default is to map out square braces with backslashes, which strings in quotes will reabsorb.

Compatibility

Note that expr support differs slightly from version to version of Tcl, notably the addition of new operators such as eq and handling of True. Also, as of 8.5 user defined tcl::mathops funcs are definable.

1.6  Fields

TAD defines a row of data as a collection of fields (ie. columns) where each field may optionally provide a default value, a description and other name/value options thereafter.

For example:

    table create cust -fields { name age income} 
 or
    table create cust -fields {
        { name   {}  "Name of customer" }
        { age    0   "Current age" -type int }
        { income 0   "Gross income" }
        { job    {}  "Type of work" }
    }

The field conf command may be used to modify options for fields.

Types

Types may be used to restrict or check data stored in fields. The standard types are: int bool double string is list blist dict date choice match matchi regexp regexpi.

Types are in fact lists, and may take optional arguments. For example, the match type takes a pattern argument. This, for example, could allow name above to be restricted to the form "LAST, FIRST" using:

   field conf cust name -type {match "*, *"}

Also int and double take optional -min/-max options.

Types may also be strict. If a type begins with a capital letter, it implies that the type is strict and that an empty string is not allowed. Thus, although there is no 'NOTNULL option, it can be implemented with -type String.

Lookup Fields

A field using a -lookup VAR option implies that the fields value represents a key within the named VAR. eg:

    table create jobs -fields { description } -keyname title
    insert jobs -values { manager "boss" worker "peon" }
    field conf cust job -lookup jobs

The insert, modify and verify commands will check that -lookup refers to a pre-existing element in the target table.

1.7  Indexes

Indexes are created using the data from one or more fields of a table to enforce uniqueness or provide fast lookups. Once defined, indexes automatically populate an index-table whenever an insert, modify, or delete occurs. The index build command can be used to build or rebuild generated indexes for a table or an entire database. eg:

index add cust -fields {name age}
index add cust -fields {name} -unique False

1.8  Memory Usage

When large tables are involved (eg. >10K rows) memory growth can become an issue due to data item convertion to list objects. To avoid this query duplicates data objects before use (but this can be disabled with conf --speed 0).

Code doing table traversal can use something like:

foreach key [array names v ?*] {
    set data [string range $v($key) 0 end]; # Dup data.
    set s [lindex $data 0]; # ...
}

1.9  File Format

TAD stores data on disk in plain UTF-8 text list format. For arrays, data is stored in key/value pairs, with the first three bytes equal to "{} ". Then when reloading, the data can be passed directly to array set. This has the advantage of being editable, and allowing inserts/modifies to be appended to the file in archive mode.

When a file does not begin with "{} ", it is assumed to be a lists which contains the field names in the first row.

1.10  Lists

There is limited support for lists. The query (but not mquery) command will accept queries on lists.

1.11  Issues

Tcl 8.4 is recommended for Tad. Version >= 8.5 has an issue with a very slowed down [clock scan]. If you must use 8.5+, consider using clockold.

© 2008 Peter MacDonald

Page last modified on August 31, 2008, at 09:40 AM