GATE
Version 3.1-2270

gate.creole.kea
Class Kea

java.lang.Object
  extended by gate.util.AbstractFeatureBearer
      extended by gate.creole.AbstractResource
          extended by gate.creole.AbstractProcessingResource
              extended by gate.creole.AbstractLanguageAnalyser
                  extended by gate.creole.kea.Kea
All Implemented Interfaces:
ANNIEConstants, Executable, ActionsPublisher, LanguageAnalyser, ProcessingResource, Resource, FeatureBearer, NameBearer, Serializable

public class Kea
extends AbstractLanguageAnalyser
implements ActionsPublisher

This is wrapper for using the KEA Keyphrase extractor (http://www.nzdl.org/Kea/) within the GATE Language Engineering architecture (http://gate.ac.uk). It exposes KEA as a GATE Processing Resource that has two functioning modes:

See Also:
Serialized Form

Nested Class Summary
protected  class Kea.LoadModelAction
          Action for loading a saved model.
protected  class Kea.SaveModelAction
          Action used to save a trained model.
 
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource
AbstractProcessingResource.InternalStatusListener, AbstractProcessingResource.IntervalProgressListener
 
Field Summary
protected  List actions
          The list of GUI actions available from this PR on popup menus.
protected  weka.core.FastVector atts
          Data structure used internally to define the dataset.
protected  weka.core.Instances data
          The dataset.
protected  kea.KEAFilter keaFilter
          The KEA filter object which incorporates the actual model.
protected  boolean trainingFinished
          This flag is used to determine whether the model has been constructed or not.
 
Fields inherited from class gate.creole.AbstractLanguageAnalyser
corpus, document
 
Fields inherited from class gate.creole.AbstractProcessingResource
interrupted
 
Fields inherited from class gate.creole.AbstractResource
name
 
Fields inherited from class gate.util.AbstractFeatureBearer
features
 
Fields inherited from interface gate.creole.ANNIEConstants
ANNOTATION_COREF_FEATURE_NAME, DATE_ANNOTATION_TYPE, DATE_POSTED_ANNOTATION_TYPE, DOCUMENT_COREF_FEATURE_NAME, JOB_ID_ANNOTATION_TYPE, LOCATION_ANNOTATION_TYPE, LOOKUP_ANNOTATION_TYPE, LOOKUP_CLASS_FEATURE_NAME, LOOKUP_MAJOR_TYPE_FEATURE_NAME, LOOKUP_MINOR_TYPE_FEATURE_NAME, LOOKUP_ONTOLOGY_FEATURE_NAME, MONEY_ANNOTATION_TYPE, ORGANIZATION_ANNOTATION_TYPE, PERSON_ANNOTATION_TYPE, PERSON_GENDER_FEATURE_NAME, PR_NAMES, SENTENCE_ANNOTATION_TYPE, SPACE_TOKEN_ANNOTATION_TYPE, TOKEN_ANNOTATION_TYPE, TOKEN_CATEGORY_FEATURE_NAME, TOKEN_KIND_FEATURE_NAME, TOKEN_LENGTH_FEATURE_NAME, TOKEN_ORTH_FEATURE_NAME, TOKEN_STRING_FEATURE_NAME
 
Constructor Summary
Kea()
          Anonymous constructor, required by GATE.
 
Method Summary
protected  void annotateKeyPhrases(List phrases)
          Annnotates the document with all the occurences of keyphrases from a List.
 void execute()
          Executes this PR.
protected  void finishTraining()
          Stops the training phase and builds the actual model.
 List getActions()
          Gets the list of GUI actions available from this PR.
 Boolean getDisallowInternalPeriods()
           
 String getInputAS()
          Gets the name for the input annotation set.
 String getKeyphraseAnnotationType()
          Sets the annotation type to be used for keyphrases.
 Integer getMaxPhraseLength()
           
 Integer getMinNumOccur()
           
 Integer getMinPhraseLength()
           
 String getOutputAS()
          Gets the name for the output annotation set.
 Integer getPhrasesToExtract()
           
 Boolean getTrainingMode()
           
 Boolean getUseKFrequency()
           
 Resource init()
          Initialises this KEA Processing Resource.
protected  void initModel()
          Initialises the KEA model.
 void setDisallowInternalPeriods(Boolean dissallowInternalPeriods)
           
 void setInputAS(String inputAS)
          Sets the name for the input annotation set.
 void setKeyphraseAnnotationType(String keyphraseAnnotationType)
          Sets the annotation type to be used for keyphrases.
 void setMaxPhraseLength(Integer maxPhraseLength)
           
 void setMinNumOccur(Integer minNumOccur)
           
 void setMinPhraseLength(Integer minPhraseLength)
           
 void setOutputAS(String outputAS)
          Sets the name for the output annotation set.
 void setPhrasesToExtract(Integer phrasesToExtract)
           
 void setTrainingMode(Boolean trainingMode)
           
 void setUseKFrequency(Boolean useKFrequency)
           
 
Methods inherited from class gate.creole.AbstractLanguageAnalyser
getCorpus, getDocument, setCorpus, setDocument
 
Methods inherited from class gate.creole.AbstractProcessingResource
addProgressListener, addStatusListener, cleanup, fireProcessFinished, fireProgressChanged, fireStatusChanged, interrupt, isInterrupted, reInit, removeProgressListener, removeStatusListener
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class gate.util.AbstractFeatureBearer
getFeatures, setFeatures
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.ProcessingResource
reInit
 
Methods inherited from interface gate.Resource
cleanup, getParameterValue, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 
Methods inherited from interface gate.Executable
interrupt, isInterrupted
 

Field Detail

trainingFinished

protected boolean trainingFinished
This flag is used to determine whether the model has been constructed or not. During training mode the training data is simply collected and this flag is set to false. The first time when the traied model is required (which could be either the first time the application mode is started or when the model is being saved) the model is built from the collected instances and this flag is set to true.
If this flag is found to be true during training phase (i.e. there is an attempt to train an already triend model) then the current model will be discarded and a new one will be created. The traininig will be performed using the newly created model.


keaFilter

protected kea.KEAFilter keaFilter
The KEA filter object which incorporates the actual model.


atts

protected weka.core.FastVector atts
Data structure used internally to define the dataset.


data

protected weka.core.Instances data
The dataset.


actions

protected List actions
The list of GUI actions available from this PR on popup menus.

Constructor Detail

Kea

public Kea()
Anonymous constructor, required by GATE. Does nothing.

Method Detail

getActions

public List getActions()
Gets the list of GUI actions available from this PR. Currently Load and Save model.

Specified by:
getActions in interface ActionsPublisher
Returns:

execute

public void execute()
             throws ExecutionException
Executes this PR. Depeding on the state of the trainingMode switch it will either train a model or apply it over the documents.
Trainig consists of collecting keyphrase annotations from the input annotation set of the input documents. The first time a trained model is required (either application mode has started or the model is being saved) the actual model ({link @ #keaModel}) will be constructed.
The application mode consists of using a trained model to generate keyphrase annotations on the output annotation set of the input documents.

Specified by:
execute in interface Executable
Overrides:
execute in class AbstractProcessingResource
Throws:
ExecutionException

annotateKeyPhrases

protected void annotateKeyPhrases(List phrases)
                           throws Exception
Annnotates the document with all the occurences of keyphrases from a List. Uses the java.util.regex package to search for ocurences of keyphrases.

Parameters:
phrases - the list of keyphrases.
Throws:
Exception

setKeyphraseAnnotationType

public void setKeyphraseAnnotationType(String keyphraseAnnotationType)
Sets the annotation type to be used for keyphrases.

Parameters:
keyphraseAnnotationType -

getKeyphraseAnnotationType

public String getKeyphraseAnnotationType()
Sets the annotation type to be used for keyphrases.

Returns:

setInputAS

public void setInputAS(String inputAS)
Sets the name for the input annotation set.

Parameters:
inputAS -

getInputAS

public String getInputAS()
Gets the name for the input annotation set.

Returns:

setOutputAS

public void setOutputAS(String outputAS)
Sets the name for the output annotation set.

Parameters:
outputAS -

getOutputAS

public String getOutputAS()
Gets the name for the output annotation set.

Returns:

init

public Resource init()
              throws ResourceInstantiationException
Initialises this KEA Processing Resource.

Specified by:
init in interface Resource
Overrides:
init in class AbstractProcessingResource
Returns:
Throws:
ResourceInstantiationException

initModel

protected void initModel()
                  throws Exception
Initialises the KEA model.

Throws:
Exception

finishTraining

protected void finishTraining()
                       throws ExecutionException
Stops the training phase and builds the actual model.

Throws:
ExecutionException

setMaxPhraseLength

public void setMaxPhraseLength(Integer maxPhraseLength)

getMaxPhraseLength

public Integer getMaxPhraseLength()

setMinPhraseLength

public void setMinPhraseLength(Integer minPhraseLength)

getMinPhraseLength

public Integer getMinPhraseLength()

setDisallowInternalPeriods

public void setDisallowInternalPeriods(Boolean dissallowInternalPeriods)

getDisallowInternalPeriods

public Boolean getDisallowInternalPeriods()

setUseKFrequency

public void setUseKFrequency(Boolean useKFrequency)

getUseKFrequency

public Boolean getUseKFrequency()

setMinNumOccur

public void setMinNumOccur(Integer minNumOccur)

getMinNumOccur

public Integer getMinNumOccur()

getTrainingMode

public Boolean getTrainingMode()

setTrainingMode

public void setTrainingMode(Boolean trainingMode)

setPhrasesToExtract

public void setPhrasesToExtract(Integer phrasesToExtract)

getPhrasesToExtract

public Integer getPhrasesToExtract()

GATE
Version 3.1-2270