SQL SERVER – Introduction to expressor Datascript Modules

With the release of expressor 3.3, expressor software has added a significant new feature to the expressor Studio tool – the ability to easily extend functionality through the incorporation of reusable script files.  A developer using expressor Studio may write these scripts and add them to any number of projects, or you can integrate scripts written by other developers.  Let’s see how this works.

Suppose you want to execute a one-to-many application in which each incoming record needs to be parsed into multiple output records.  For example, a record containing monthly data over a year period needs to be reworked so that each emitted record contains data from a single month.  The following fragment shows two representative data records, where the first field is the year and the following fields are the monthly data, January through December.

Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
2010,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012
2009,901,902,903,904,905,906,907,908,909,910,911,912

You want each output record to contain only three fields: Year, Month, and Value.

Year,Month,Value
2010,Jan,1001
2010,Feb,1002
2010,Mar,1003
2010,Apr,1004
...

To illustrate how you can reuse the coding within a Datascript Module, think of the incoming record as a sequence of fields each with a specific attribute name and value.

GroupAttribute,Atr1,Atr2,Atr3,Atr4,Atr5,Atr6,Atr7,Atr8,Atr9,Atr10,Atr11,Atr12

And think of each emitted record as comprised of three fields.

GroupAttribute,Atr,Val

Now you can write generic code within the Datascript Module that can handle any incoming record in a one-to-many application.

  • Since you want to use the Datascript Module in multiple applications, create this artifact within an expressor library.
    • In this example, the library is named modules_library.0 and the Datascript Module is named OneToMany.
  • Within the OneToMany module, include the following scripting.
    • Note that line 4, which defines the fully qualified name of the module, is entered by Studio when you first create the module.  It is a concatenation of the library name and module file name.
    • The Datascript Module includes a single function named oneToMany, which takes as arguments the name of the attribute holding a value that uniquely identifies the incoming record and the incoming record that needs to be divided into multiple output records.
    • Observe how the code uses the generic field names GroupAttribute, Atr, and Val to populate each output record.  It’s not until the collection of output records is returned to the calling code that the coding uses the actual attribute names (Year, Month, Value) of the output record.
      • Consequently, the actual attribute names used by the incoming and outgoing records are only referenced within the calling code, which is specific to an application.
      • All coding within the Datascript Module uses generic names applicable to any incoming data.

Now, let’s turn our attention to the code that uses this Datascript Module.  In this case, the application is quite simple.  It reads a file that contains the yearly summary records, uses a Transform operator to process each record into multiple output records, and then writes each monthly output record to a file.

  • Line 2 cross-references the Datascript Module.  This statement is equivalent to simply copying and pasting the scripting from the module into the operator’s coding.
  • Line 5 invokes the oneToMany function defined in the module.
  • Lines 10 through 20 iterate through the collection of output records returned by the module’s oneToMany function, emitting each record individually.
    • To emit more than one record from the Transform operator, the scripting uses an iterator function, another new feature introduced in expressor 3.3.

Using modules written by other developers is no more involved.  In fact, expressor 3.3 ships with many modules that your code can utilize.  These include a cURL module that you can use to invoke on Web services or FTP servers, a SAX XML module that you can use to parse an XML document, a socket module that you can use to interact with an external application, a SQL module that allows your code to execute a SQL statement against a remote database, as well as some expressor Datascript Modules that include utility functions useful for parsing CSV files and serializing and de-serializing expressor Datascript tables.

To use any of these modules, simply include the necessary require statement at the beginning of your scripting.  The expressor Community Knowledge Base includes entries that describe use of these modules. The expressor Studio desktop ETL tool is freely downloadable from the expressor website.   I encourage you to give it a try!

Reference: Pinal Dave (http://blog.SQLAuthority.com)

One thought on “SQL SERVER – Introduction to expressor Datascript Modules

  1. Pingback: SQLAuthority News – Latest expressor Data Integration Platform Posts « Journey to SQLAuthority

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s