GATE
Version 3.1-2270

gate.creole.tokeniser.chinesetokeniser
Class ChineseTokeniser

java.lang.Object
  extended by gate.util.AbstractFeatureBearer
      extended by gate.creole.AbstractResource
          extended by gate.creole.AbstractProcessingResource
              extended by gate.creole.AbstractLanguageAnalyser
                  extended by gate.creole.tokeniser.chinesetokeniser.ChineseTokeniser
All Implemented Interfaces:
ANNIEConstants, Executable, LanguageAnalyser, ProcessingResource, Resource, FeatureBearer, NameBearer, Serializable

public class ChineseTokeniser
extends AbstractLanguageAnalyser
implements ProcessingResource

Title: ChineseTokeniser.java

Description: This class is a wrapper for segmenter.

Tokenises a Chinese document using the Chinesse segmenter

Version:
1.0
Author:
Niraj Aswani
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource
AbstractProcessingResource.InternalStatusListener, AbstractProcessingResource.IntervalProgressListener
 
Field Summary
 
Fields inherited from class gate.creole.AbstractLanguageAnalyser
corpus
 
Fields inherited from class gate.creole.AbstractProcessingResource
interrupted
 
Fields inherited from class gate.creole.AbstractResource
name
 
Fields inherited from class gate.util.AbstractFeatureBearer
features
 
Fields inherited from interface gate.creole.ANNIEConstants
ANNOTATION_COREF_FEATURE_NAME, DATE_ANNOTATION_TYPE, DATE_POSTED_ANNOTATION_TYPE, DOCUMENT_COREF_FEATURE_NAME, JOB_ID_ANNOTATION_TYPE, LOCATION_ANNOTATION_TYPE, LOOKUP_ANNOTATION_TYPE, LOOKUP_CLASS_FEATURE_NAME, LOOKUP_MAJOR_TYPE_FEATURE_NAME, LOOKUP_MINOR_TYPE_FEATURE_NAME, LOOKUP_ONTOLOGY_FEATURE_NAME, MONEY_ANNOTATION_TYPE, ORGANIZATION_ANNOTATION_TYPE, PERSON_ANNOTATION_TYPE, PERSON_GENDER_FEATURE_NAME, PR_NAMES, SENTENCE_ANNOTATION_TYPE, SPACE_TOKEN_ANNOTATION_TYPE, TOKEN_ANNOTATION_TYPE, TOKEN_CATEGORY_FEATURE_NAME, TOKEN_KIND_FEATURE_NAME, TOKEN_LENGTH_FEATURE_NAME, TOKEN_ORTH_FEATURE_NAME, TOKEN_STRING_FEATURE_NAME
 
Constructor Summary
ChineseTokeniser()
          Default Constructor
 
Method Summary
 void execute()
          This method gets executed whenever user clicks on the Run button available in the GATE gui.
 String getAnnotationSetName()
          Returns the provided annotationset name
 Document getDocument()
          Returns the document under process
 String getEncoding()
          Returns the document under process
 Boolean getGenerateSpaceTokens()
          Gets the boolean parameter which states if segmenter should produce the space tokens
 URL getRulesURL()
          Returns the URL of the file, which contains rules for the tokeniser
 Boolean getRunSegmenter()
          Gets the boolean parameter which states if segmenter should run
 Resource init()
          Initialise this resource, and return it.
 void reInit()
          This method reInitialises the segmenter
 void setAnnotationSetName(String name)
          AnnotationSet name
 void setDocument(Document document)
          Sets the document to be processed
 void setEncoding(String encoding)
          Sets the encoding to be used.
 void setGenerateSpaceTokens(Boolean value)
          Sets the boolean parameter which states if segmenter should produce the space tokens
 void setRulesURL(URL rules)
          URL for the file, which contains rules to be given to the tokeniser
 void setRunSegmenter(Boolean runSegmenter)
          Sets the boolean parameter which states if segmenter should run
 
Methods inherited from class gate.creole.AbstractLanguageAnalyser
getCorpus, setCorpus
 
Methods inherited from class gate.creole.AbstractProcessingResource
addProgressListener, addStatusListener, cleanup, fireProcessFinished, fireProgressChanged, fireStatusChanged, interrupt, isInterrupted, removeProgressListener, removeStatusListener
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class gate.util.AbstractFeatureBearer
getFeatures, setFeatures
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.Resource
cleanup, getParameterValue, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 
Methods inherited from interface gate.Executable
interrupt, isInterrupted
 

Constructor Detail

ChineseTokeniser

public ChineseTokeniser()
Default Constructor

Method Detail

init

public Resource init()
              throws ResourceInstantiationException
Description copied from class: AbstractProcessingResource
Initialise this resource, and return it.

Specified by:
init in interface Resource
Overrides:
init in class AbstractProcessingResource
Throws:
ResourceInstantiationException

reInit

public void reInit()
            throws ResourceInstantiationException
This method reInitialises the segmenter

Specified by:
reInit in interface ProcessingResource
Overrides:
reInit in class AbstractProcessingResource
Throws:
ResourceInstantiationException

execute

public void execute()
             throws ExecutionException
This method gets executed whenever user clicks on the Run button available in the GATE gui. It runs the segmenter on the given document and segments the text by addting spaces or space tokens with 0-length character (depends on the value of generateSpaceTokens selected by the user at run time).

Specified by:
execute in interface Executable
Overrides:
execute in class AbstractProcessingResource
Throws:
ExecutionException

setRunSegmenter

public void setRunSegmenter(Boolean runSegmenter)
Sets the boolean parameter which states if segmenter should run

Parameters:
runSegmenter -

getRunSegmenter

public Boolean getRunSegmenter()
Gets the boolean parameter which states if segmenter should run


setGenerateSpaceTokens

public void setGenerateSpaceTokens(Boolean value)
Sets the boolean parameter which states if segmenter should produce the space tokens


getGenerateSpaceTokens

public Boolean getGenerateSpaceTokens()
Gets the boolean parameter which states if segmenter should produce the space tokens


setDocument

public void setDocument(Document document)
Sets the document to be processed

Specified by:
setDocument in interface LanguageAnalyser
Overrides:
setDocument in class AbstractLanguageAnalyser
Parameters:
document - - document to be processed

getDocument

public Document getDocument()
Returns the document under process

Specified by:
getDocument in interface LanguageAnalyser
Overrides:
getDocument in class AbstractLanguageAnalyser

setEncoding

public void setEncoding(String encoding)
Sets the encoding to be used.

Parameters:
encoding - the encoding.

getEncoding

public String getEncoding()
Returns the document under process


setRulesURL

public void setRulesURL(URL rules)
URL for the file, which contains rules to be given to the tokeniser

Parameters:
rules -

getRulesURL

public URL getRulesURL()
Returns the URL of the file, which contains rules for the tokeniser

Returns:
a URL value.

setAnnotationSetName

public void setAnnotationSetName(String name)
AnnotationSet name

Parameters:
name - Name of the annotation

getAnnotationSetName

public String getAnnotationSetName()
Returns the provided annotationset name

Returns:
a String value.

GATE
Version 3.1-2270