GATE
Version 3.1-2270

gate.creole.ml.svmlight
Class SVMLightWrapper

java.lang.Object
  extended by gate.creole.ml.svmlight.SVMLightWrapper
All Implemented Interfaces:
AdvancedMLEngine, MLEngine, ActionsPublisher

public class SVMLightWrapper
extends Object
implements AdvancedMLEngine, ActionsPublisher

Wrapper class for the SVM Light support vector machine learning algorithm. The executable files, SVM_Learn and SVM_Classify must be placed on your path in order for this wrapper to work.


Nested Class Summary
protected  class SVMLightWrapper.LoadDatasetAction
          This class adds the option to the context menu in the GUI that allows the user to load a dataset which is in SVM Light's own format from a file.
protected  class SVMLightWrapper.LoadModelAction
          This reloads a file that was previously saved using the SaveModelAction class.
protected  class SVMLightWrapper.SaveDatasetAction
           
protected  class SVMLightWrapper.SaveModelAction
          This allows the model, including its parameters to be saved to a file.
 
Field Summary
protected  List actionsList
           
protected  boolean datasetChanged
          Marks whether the dataset was changed since the last time the classifier was built.
protected  DatasetDefintion datasetDefinition
           
protected  File modelFile
           
protected  boolean modelTrained
          Marks whether in the present state a trained model exists (whether or not it is up to date)
protected  HashMap nominalValue2IntegerHash
           
protected  org.jdom.Element optionsElement
          The JDom element contaning the options fro this wrapper.
protected  ProcessingResource owner
           
protected  File resultsFile
           
protected  StatusListener sListener
           
protected  File testDataFile
           
protected  List trainingData
          This List stores all the data that has been collected.
protected  File trainingDataFile
          These file objects store the path names to the files that will be used to store the model, data and results while they are passed to and from svm light.
 
Constructor Summary
SVMLightWrapper()
          This constructor sets up action list so that these actions (loading and saving models and data) will be available from a context menu in the gui).
 
Method Summary
 void addTrainingInstance(List attributeValues)
          This is called to add a new training instance to the data set collected in this wrapper object.
 List batchClassifyInstances(List instances)
          Decide on the outcomes for all the instances, based on the values of all the features for each of the instances in a document.
 Object classifyInstance(List attributeValues)
          Decide on the outcome for the instance, based on the values of all the features.
 void cleanUp()
          Delete all the temporary files when the processing resource is closed.
 List getActions()
          Gets the list of actions that can be performed on this resource.
 DatasetDefintion getDatasetDefinition()
           
 void init()
          Initialises the classifier and prepares for running.
 void initialiseAndTrainClassifier()
          Use svm_learn to create a new svm model, based on all the data currently stored in the wrapper.
 boolean isDatasetChanged()
          Has the dataset changed since the model was last trained?
 boolean isModelTrained()
          Is there a trained model available (whether or not it is up to date)?
 void load(InputStream is)
          Loads the state of this engine from previously saved data.
 void loadDataset(FileReader reader)
          Reads training data in SVM Light format from a file and adds it to the collection of training examples.
 void loadModel(File file)
          Load a previously saved state of the engine.
 void save(OutputStream os)
          Saves the state of the engine for reuse at a later time. optionsElement is not saved so as to make this code consistent with wekaWrapper.
 void saveDataset(FileWriter writer, List dataSet)
          Write the data set to a file in SVM Light format.
 void saveModel(File file)
          Saves the state of the engine for reuse at a later time. optionsElement is not saved so as to make this code consistent with wekaWrapper.
 void setDatasetDefinition(DatasetDefintion definition)
          Set the data set defition for this classifier.
 void setOptions(org.jdom.Element optionsElem)
          Take a representation of the part of the XML configuration file which corresponds to <OPTIONS>, and store it.
 void setOwnerPR(ProcessingResource pr)
          Registers the PR using the engine with the engine itself.
 boolean supportsBatchMode()
          Returns true if the engine supports BatchMode, returns false otherwise.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nominalValue2IntegerHash

protected HashMap nominalValue2IntegerHash

datasetDefinition

protected DatasetDefintion datasetDefinition

trainingData

protected List trainingData
This List stores all the data that has been collected. Each item is a List of objects, each of which is an attribute (and one of which is the class attribute).


optionsElement

protected org.jdom.Element optionsElement
The JDom element contaning the options fro this wrapper.


datasetChanged

protected boolean datasetChanged
Marks whether the dataset was changed since the last time the classifier was built.


modelTrained

protected boolean modelTrained
Marks whether in the present state a trained model exists (whether or not it is up to date)


trainingDataFile

protected File trainingDataFile
These file objects store the path names to the files that will be used to store the model, data and results while they are passed to and from svm light.


testDataFile

protected File testDataFile

modelFile

protected File modelFile

resultsFile

protected File resultsFile

actionsList

protected List actionsList

owner

protected ProcessingResource owner

sListener

protected StatusListener sListener
Constructor Detail

SVMLightWrapper

public SVMLightWrapper()
This constructor sets up action list so that these actions (loading and saving models and data) will be available from a context menu in the gui).

Method Detail

cleanUp

public void cleanUp()
Delete all the temporary files when the processing resource is closed.

Specified by:
cleanUp in interface MLEngine

setOptions

public void setOptions(org.jdom.Element optionsElem)
Take a representation of the part of the XML configuration file which corresponds to <OPTIONS>, and store it.

Specified by:
setOptions in interface MLEngine
Parameters:
optionsElem - the JDom element containing the options from the configuration.
Throws:
GateException

addTrainingInstance

public void addTrainingInstance(List attributeValues)
This is called to add a new training instance to the data set collected in this wrapper object.

Specified by:
addTrainingInstance in interface MLEngine
Parameters:
attributeValues - A list of String objects, each of which corresponds to an attribute value. For boolean attributes the values will be true or false.

setDatasetDefinition

public void setDatasetDefinition(DatasetDefintion definition)
Set the data set defition for this classifier.

Specified by:
setDatasetDefinition in interface MLEngine
Parameters:
definition - A specification of the types and allowable values of all the attributes, as specified in the <DATASET> part of the configuration file.

classifyInstance

public Object classifyInstance(List attributeValues)
                        throws ExecutionException
Decide on the outcome for the instance, based on the values of all the features. N.B. Unless this function was previously called, and there has been no new data added since, the model will be trained when it is called. This could result in calls to this function taking a long time to execute.

Specified by:
classifyInstance in interface MLEngine
Parameters:
attributeValues - A list of all the attributes, including the <CLASS/> attribute. The value of the <CLASS/> attribute is, however, arbitrary.
Returns:
A string value giving the nominal value of the class or, if the outcome is boolean, a java String with value "true" or "false", or if the 'class' is numeric, the estimated numeric value for class.
Throws:
ExecutionException

batchClassifyInstances

public List batchClassifyInstances(List instances)
                            throws ExecutionException
Decide on the outcomes for all the instances, based on the values of all the features for each of the instances in a document. N.B . Unless this function was previously called, and there has been no new data added since, the model will be trained when it is called. This could result in calls to this function taking a long time to execute.

Specified by:
batchClassifyInstances in interface MLEngine
Parameters:
attributeValues - A list of lists of all the attributes, (one list per instance) including the <CLASS/>attribute. The value of the <CLASS/>attribute is, however, arbitrary.
Returns:
A list of string values giving the nominal value of the class or, if the outcome is boolean, a java String with value "true" or "false", or if the 'class' is numeric, the estimated numeric value for class.
Throws:
ExecutionException

initialiseAndTrainClassifier

public void initialiseAndTrainClassifier()
                                  throws ExecutionException,
                                         IOException
Use svm_learn to create a new svm model, based on all the data currently stored in the wrapper.

Throws:
ExecutionException
IOException

init

public void init()
          throws GateException
Initialises the classifier and prepares for running. Before calling this method, the datasetDefinition and optionsElement fields should have been set using calls to the appropriate methods. It also creates temporary files needed for passing data to and from SVMLight.

Specified by:
init in interface MLEngine
Throws:
GateException - If it is not possible to initialise the classifier for any reason.

getActions

public List getActions()
Gets the list of actions that can be performed on this resource.

Specified by:
getActions in interface ActionsPublisher
Returns:
a List of Action objects (or null values)

setOwnerPR

public void setOwnerPR(ProcessingResource pr)
Registers the PR using the engine with the engine itself.

Specified by:
setOwnerPR in interface MLEngine
Parameters:
pr - the processing resource that owns this engine.

getDatasetDefinition

public DatasetDefintion getDatasetDefinition()

saveDataset

public void saveDataset(FileWriter writer,
                        List dataSet)
Write the data set to a file in SVM Light format.

Parameters:
writer - An open file writer to which the data is to be written.
dataSet - The data set to be saved, in the form of a list of attributes in the form passed from the ML PR.

loadDataset

public void loadDataset(FileReader reader)
                 throws GateRuntimeException,
                        IOException
Reads training data in SVM Light format from a file and adds it to the collection of training examples.

Parameters:
reader - A file reader from which to read the data.
Throws:
GateRuntimeException
IOException

load

public void load(InputStream is)
          throws IOException
Loads the state of this engine from previously saved data.

Parameters:
An - open InputStream from which the model will be loaded.
Throws:
IOException

save

public void save(OutputStream os)
          throws IOException
Saves the state of the engine for reuse at a later time. optionsElement is not saved so as to make this code consistent with wekaWrapper.

Parameters:
An - open output stream to which the model will be saved.
Throws:
IOException

loadModel

public void loadModel(File file)
               throws IOException
Load a previously saved state of the engine. If the saved state includes an up-to-date trained model, this is also reloaded.

Parameters:
file - the file from which the state is to be loaded. If the state indicates that a trained model should be loaded, this should be in file.NativePart.
Throws:
IOException

saveModel

public void saveModel(File file)
               throws IOException
Saves the state of the engine for reuse at a later time. optionsElement is not saved so as to make this code consistent with wekaWrapper. If an up-to-date trained model exists, it will be saved in file.NativePart.

Throws:
IOException

isDatasetChanged

public boolean isDatasetChanged()
Has the dataset changed since the model was last trained?


isModelTrained

public boolean isModelTrained()
Is there a trained model available (whether or not it is up to date)?


supportsBatchMode

public boolean supportsBatchMode()
Description copied from interface: AdvancedMLEngine
Returns true if the engine supports BatchMode, returns false otherwise.

Specified by:
supportsBatchMode in interface AdvancedMLEngine

GATE
Version 3.1-2270