Object-Oriented
Data Analysis Software Concept
This document describes object-oriented concepts used to implement scientific data analysis tasks, mainly in the context of the HESSI data analysis software. It also describes the method to implement custom data analysis tasks using frameworks and strategy holders. This description assumes an IDL solarsoft environment.
Abstract
classes and concrete classes
Strategies
and Strategy Holders
How
to work wih strategy holders
The HESSI data analysis
software uses object classes to stepwise reconstruct images, spectra and light
curves from raw data. Each step of the construction is associated with an object
class. An object class is defined as
en entity describing a data type and the methods (i.e., procedures and functions) to access and handle it. In
the context discussed here, an object class includes a primary data type, a
process method, control and information parameters, accessor methods and display methods.
An instance of an object
class must be first created before it can be used. Here is the idl command to
do this[1]:
IDL> o = Obj_New( 'class_name' )
An IDL object reference
variable o
is returned by the instance creation function. Data and methods associated with
the object can be accessed with this variable. In the instance creation
function, default values are set (i.e. assigned) to control parameters. Often,
a constructor function is used to create the object reference:
IDL>o=class_name() (1)
Primary data requests
are handled by the accessor method GetData:
IDL>data = o->GetData( )
o is the object reference created
by the instance creation function (1). The object first checks whether its
state is consistent, that is, if the primary data contained in the object
corresponds to the values of the control parameters. If the object state is
consistent, GetData returns (i.e. passes to the client that issued the request) the
data that is contained in the internal memory area of the object. If the object
state is not consistent, the GetData method calls first the Process method, which puts the object
back into a consistent state by running the transformation procedure associated
with the class. After the transformation procedure has completed, the data is
returned to the client.
Although the Process method can be defined fully
arbitrarily, it usually includes four steps.
Object classes can be
divided into abstract and concrete classes.
Abstract classes are
never used alone. Their existence (as good parents!) makes sense only through
concrete classes (their child).
For HESSI, concrete
classes are divided into several groups.
All HESSI classes
have a common parent class. This common parent class is called Framework.
It contains the code of the generic methods Get, GetData, Set, SetData, Plot, Write, Print. Because all classes inherit Framework,
they implement the same interface. In other words, the same accessor and
display methods are available for each class. As a result, and because each
processing step reads data from a source object which implements the same
parent class (a sibling object), the software actually reuses at all
levels the same code of a single accessor method. A graphical representation of
the framework is given in Figure 1.
Figure 1: The framework defines the basic methods required for
data analysis tasks. It defers the implementation of the data-type dependent
operations (e.g., Process) to the concrete class that inherits the framework.
The process of analyzing
data consists of a chain of transformation processed by sibling objects. A
given class gets data from a source class, and passes data to a client class.
The last client, of course, is the user.
Classes can be organized
in well-defined design patterns. For HESSI, the imaging algorithms implement
the template method pattern. An abstract class hsi_image_alg is common to all
imaging algorithms. A (small) concrete class implements a so-called hook
that contains the actual image algorithm (clean, mem, pixon, forward fit). All
other processing steps are common and therefore implemented in the abstract
class. In this way, the client object does not need to care what it uses for
imaging algorithm. It only knows that it has for source an object with parent
class of type hsi_image_alg
This describes how you
can use a framework for your own data analysis tasks. Let us assume that you
have implemented an algorithm and you want to manage it using a framework.
First, why do you want that? There are several reasons.
Let's assume you have
implemented an algorithm algo, and you want
to manage it with a framework. algo has the following declaration (or interface):
PRO Algo, input, output, param1, param2, out_param1, out_param2,
KEYWORD1=keyword1, KEYWORD2=keyword2
input is a variable containing the
data to be processed by the algorithm (source data), output is the data generated by the
algorithm, param1 and param2 are input (control) parameters, and out_param1 and out_param2 are output (info) parameters.
Last, keyword1 and keyword2 are keyword values that can be set by users at run time.
There are a number of
IDL templates ready to help you in the implementation. You can find these
templates in your SSW installation in the directory $SSW/hessi/idl/objects, or you can get them here:
framework_template__define.pro framework_template_control__define.pro,
framework_template_control.pro,
framework_template_info__define.pro.
Here is how you deal
with it:
PRO
Algo_Control__Define
struct =
{algo_control, param1: 0L, param2: 0.}
END
(the type of the parameters is arbitrary)
FUNCTION Algo_Control
var = {algo_control}
var.param1 = 1002L
var.param2 = !pi
RETURN, var
END
PRO
Algo_Info__Define
struct = {algo_info,
out_param1: 0B, out_param2: 0D}
END
(the type of the parameters is arbitrary)
FUNCTION
Algo::INIT, SOURCE = source, _EXTRA=_extra
IF NOT Obj_Valid( source ) THEN BEGIN
source = obj_new(
'algo_source' )
ENDIF
ret=self->Framework::INIT( CONTROL=framework_template_control(),$
INFO={framework_template_info}, $
SOURCE=source,
$
_EXTRA=_extra )
RETURN, ret
END
PRO
Algo::Process, KEYWORD1=keyword1, KEYWORD2=keyword2
param1 = self->Get( /PARAM1 )
param2 = self->Get( /PARAM2 )
source = self->Get( /SOURCE )
data = source->GetData( _EXTRA = _extra )
Algo, input, output, param1, param2, $
out_param1, out_param2, $
KEYWORD1=keyword1, KEYWORD2=keyword2
self->SetData, output
self->Set, OUT_PARAM1 = out_param1
self->Set, OUT_PARAM2 = out_param2
END
`
FUNCTION Algo::GetData, $
THIS_X_SLICE=this_x_slice, $
THIS_Y_SLICE=this_y_slice, $
THIS_Z_SLICE=this_z_slice, _EXTRA=_extra
data=self->Framework::GetData( _EXTRA = _extra )
; actually this could be done more nicely, but anyway...
IF Keyword_Set( THIS_X_SLICE ) THEN BEGIN
RETURN,
data[this_x_slice,*,*]
ENDIF ELSE IF Keyword_Set( THIS_Y_SLICE ) THEN BEGIN
RETURN,
data[*,this_y_slice,*]
ENDIF ELSE IF Keyword_Set( THIS_Z_SLICE ) THEN BEGIN
RETURN, data[*, *, this_z_slice
]
ENDIF
RETURN, data
END
PRO Algo::Set,
$
PARAMETER=parameter,
$
_EXTRA=_extra
IF Keyword_Set( PARAMETER ) THEN BEGIN
; first set the parameter using the original Set
self->Framework::Set, PARAMETER = parameter
; then take some action that depends on this parameter
Take_Some_Action,
parameter
ENDIF
; for all other parameters (included in _extra), just pass them to the
; original Set procedure in Framework
IF Keyword_Set( _EXTRA ) THEN BEGIN
self->Framework::Set, _EXTRA = _extra
ENDIF
END
Configure the Get function. This step is optional. The Get function needs to be modified only in very special cases, e.g. if you need to modify the value before passing in back to the user. This is not recommended, however. Note that two keyword variables NOT_FOUND and FOUND must be passed to the Get function in Framework. Note also that you eventually need to retrieve parameter from the original Get function, otherwise most of the functionality of the Get function will not work (e.g. aggregation of requested parameters in a structure).
FUNCTION Algo::Get, $
NOT_FOUND=NOT_found,
$
FOUND=found, $
PARAMETER=parameter,
$
_EXTRA=_extra
IF Keyword_Set( PARAMETER ) THEN BEGIN
parameter_local =
self->Framework::Get( /PARAMETER )
Do_Something_With_Parameter, parameter_local
ENDIF
RETURN, self->Framework::Get( PARAMETER=parameter, $
NOT_FOUND = not_found, $
FOUND=found, _EXTRA = _extra )
END
In many occasions, we need more
functionality than what is given by the framework. The framework is limited to
the management of one single data type in a single class. So what happens if we
have to deal with several related data types or classes? One general extension
of the framework can be the grouping of a number of related classes into
a single class. From this single class, each specific class can be accessed in
an integrated way. The classes belonging to the group are called strategies
to denote that they are aiming at a related goal, but use different strategies
to reach this goal. The single class that allows access to the individual
strategies is called strategy holder.
In the context of HESSI, a good illustration of the grouping into strategy objects is given by the class hsi_image. This class is a child of the strategy holder class. It holds several strategies, used to “clean” a “dirty” image. Strategies include the clean algorithm, forward fitting, pixon, and maximum entropy. From hsi_image, users (or clients in general) can specify each strategy by specifying the name of the associated class. They can switch from one algorithm to the other, e.g. from the "clean" algorithm to the "forward fit" algorithm just by giving the new name. If they want to access data from another strategy, they have only to change the strategy name:
O = hsi_image()
im_clean = o->getdata( image_algorithm = ‘clean’ )
im_fwd = o->getdata( image_algorithm = ‘forward fit’ )
The collaboration between
strategies and strategy holder can be compared to the “case” statement for
programming languages: the case statement corresponds to the strategy holder,
the statements withing the case block correspond to the strategies.
The use of strategies has several
advantages:
-
Reusability. The pattern implemented by the collaboration between
strategies and strategy holders happens frequently. Therefore, the mechanism
isolated into the strategy holder can be reused at many different places
-
Integration. Switching between different strategies is standard, that
is, its use is straightforward for the clients. Furthermore, control and
information parameters associated with a single strategy can be accessed even
when the strategy is not used.
-
Efficiency. The strategy classes that have already an instance are kept
accessible until the strategy holder is destroyed. This allows keeping around
objects ready for further use, when a client needs them at a later time.
The strategy holder pattern involves several collaborating classes. In the design pattern world, classes that share a common behavior are called strategies. Those that contain multiple implementation of a same base abstract class are called flyweight. Therefore, the strategy holder pattern is a kind of mixture between both. of these standard patterns. It is illustrated in Figure 2.
Figure 2: The strategy holder pattern
One main characteristics of the
strategy holder is that it holds not only the object references to the
strategies, but also the (common) source object that is shared by the
strategies.
The way strategy holders work is
as follows. At initialization, the strategy holder gets a list of the
strategies it needs to hold. During the existence of the strategy holder, the
data in the strategies can be accessed indifferently. Also, the strategy holder
usually
Let's assume you have defined a
number of related classes: besides algo__define you have algo_1__define
and algo_2__define.
These three algorithm share some common code that you want to reuse, in a class
called algo_strategy__define,
thus they should all
inherit this abstract class. You want to manage these three related algorithms
with a strategy_holder, and the three algorithms will be made accessible
through the interface called algo_together__define.
PRO
Algo_Strategy_Define::Process, EXTRA =extra
parameters_in =
self->Get( /CONTROL, /THIS_CLASS_ONLY )
; .. do the common things
here
self->Process_hook,
parameters_in, parameters_out, data, $
_EXTRA = _extra
; .. do more common things here
self->Set, INFO =
parameters_out
self->SetData, data
END
PRO
Algo::Process_Hook, param_in, param_out,
_EXTRA = _extra
param1 = param_in.param1
param2 = param_in.param2
source = self->Get(
/SOURCE )
data = source->GetData(
_EXTRA = _extra )
Algo, input, output,
param1, param2, $
out_param1,
out_param2, $
KEYWORD1=keyword1,
KEYWORD2=keyword2
param_out = {out_param1:
out_param1, out_param2: out_param2}
END
FUNCTION Algo_together::INIT
strategies_available = [‘ALGO’,’ALGO_1’,’ALGO2’]
self->Strategy_Holder::INIT( strategies_available )
END
Note that the configuration of the strategy class collaboration is more complex than the configuration of the framework. Therefore, the description given above does not grasp all the implementation details, but rather tries to give a feeling on how to deal with this kind of implementation.
[1] All the examples given in this document are written for the IDL command-line interface with the Solarsoft environment activated.