|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractProcessingResource
gate.creole.AbstractLanguageAnalyser
gate.creole.gazetteer.AbstractGazetteer
gate.creole.gazetteer.DefaultGazetteer
public class DefaultGazetteer
This component is responsible for doing lists lookup. The implementaion is based on finite state machines. The phrases to be recognised should be listed in a set of files, one for each type of occurences. The gazeteer is build with the information from a file that contains the set of lists (which are files as well) and the associated type for each list. The file defining the set of lists should have the following syntax: each list definition should be written on its own line and should contain:
personmale.lst:person:male:english
Each list file named in the lists definition file is just a list containing
one entry per line.
When this gazetter will be run over some input text (a Gate document) it
will generate annotations of type Lookup having the attributes specified in
the definition file.
Nested Class Summary | |
---|---|
static class |
DefaultGazetteer.CharMap
class implementing the map using binary serach by char as key to retrive the coresponding object. |
static interface |
DefaultGazetteer.Iter
|
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource |
---|
AbstractProcessingResource.InternalStatusListener, AbstractProcessingResource.IntervalProgressListener |
Field Summary | |
---|---|
static String |
DEF_GAZ_ANNOT_SET_PARAMETER_NAME
|
static String |
DEF_GAZ_CASE_SENSITIVE_PARAMETER_NAME
|
static String |
DEF_GAZ_DOCUMENT_PARAMETER_NAME
|
static String |
DEF_GAZ_ENCODING_PARAMETER_NAME
|
static String |
DEF_GAZ_LISTS_URL_PARAMETER_NAME
|
protected Set |
fsmStates
A set containing all the states of the FSM backing the gazetteer |
protected FSMState |
initialState
The initial state of the FSM that backs this gazetteer |
protected Map |
listsByNode
a map of nodes vs gaz lists |
Fields inherited from class gate.creole.gazetteer.AbstractGazetteer |
---|
annotationSetName, caseSensitive, definition, encoding, features, listeners, listsURL, mappingDefinition, wholeWordsOnly |
Fields inherited from class gate.creole.AbstractLanguageAnalyser |
---|
corpus, document |
Fields inherited from class gate.creole.AbstractProcessingResource |
---|
interrupted |
Fields inherited from class gate.creole.AbstractResource |
---|
name |
Constructor Summary | |
---|---|
DefaultGazetteer()
Build a gazetter using the default lists from the gate resources |
Method Summary | |
---|---|
boolean |
add(String singleItem,
Lookup lookup)
Adds a new string to the gazetteer |
void |
addLookup(String text,
Lookup lookup)
Adds one phrase to the list of phrases recognised by this gazetteer |
void |
execute()
This method runs the gazetteer. |
String |
getFSMgml()
Returns a string representation of the deterministic FSM graph using GML. |
Resource |
init()
Does the actual loading and parsing of the lists. |
static boolean |
isWordInternal(char ch)
Tests whether a character is internal to a word (i.e. if it's a letter or a combining mark (spacing or not)). |
Set |
lookup(String singleItem)
lookup |
protected void |
readList(LinearNode node,
boolean add)
Reads one lists (one file) of phrases |
boolean |
remove(String singleItem)
Removes a string from the gazetteer |
void |
removeLookup(String text,
Lookup lookup)
Removes one phrase to the list of phrases recognised by this gazetteer |
Methods inherited from class gate.creole.gazetteer.AbstractGazetteer |
---|
addGazetteerListener, fireGazetteerEvent, getAnnotationSetName, getCaseSensitive, getEncoding, getFeatures, getLinearDefinition, getListsURL, getMappingDefinition, getWholeWordsOnly, reInit, setAnnotationSetName, setCaseSensitive, setEncoding, setFeatures, setListsURL, setMappingDefinition, setWholeWordsOnly |
Methods inherited from class gate.creole.AbstractLanguageAnalyser |
---|
getCorpus, getDocument, setCorpus, setDocument |
Methods inherited from class gate.creole.AbstractProcessingResource |
---|
addProgressListener, addStatusListener, cleanup, fireProcessFinished, fireProgressChanged, fireStatusChanged, interrupt, isInterrupted, removeProgressListener, removeStatusListener |
Methods inherited from class gate.creole.AbstractResource |
---|
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface gate.LanguageAnalyser |
---|
getCorpus, getDocument, setCorpus, setDocument |
Methods inherited from interface gate.Resource |
---|
cleanup, getParameterValue, setParameterValue, setParameterValues |
Methods inherited from interface gate.util.NameBearer |
---|
getName, setName |
Methods inherited from interface gate.Executable |
---|
interrupt, isInterrupted |
Field Detail |
---|
public static final String DEF_GAZ_DOCUMENT_PARAMETER_NAME
public static final String DEF_GAZ_ANNOT_SET_PARAMETER_NAME
public static final String DEF_GAZ_LISTS_URL_PARAMETER_NAME
public static final String DEF_GAZ_ENCODING_PARAMETER_NAME
public static final String DEF_GAZ_CASE_SENSITIVE_PARAMETER_NAME
protected Map listsByNode
protected FSMState initialState
protected Set fsmStates
Constructor Detail |
---|
public DefaultGazetteer()
Method Detail |
---|
public Resource init() throws ResourceInstantiationException
init
in interface Resource
init
in class AbstractProcessingResource
ResourceInstantiationException
protected void readList(LinearNode node, boolean add) throws ResourceInstantiationException
node
- the nodeadd
- if true will add the phrases found in the list to the ones
recognised by this gazetter, if false the phrases found in the
list will be removed from the list of phrases recognised by this
gazetteer.
ResourceInstantiationException
public void addLookup(String text, Lookup lookup)
text
- the phrase to be addedlookup
- the description of the annotation to be added when this
phrase is recognisedpublic void removeLookup(String text, Lookup lookup)
text
- the phrase to be removedlookup
- the description of the annotation associated to this phrasepublic String getFSMgml()
public static boolean isWordInternal(char ch)
ch
- the character to be tested
public void execute() throws ExecutionException
execute
in interface Executable
execute
in class AbstractProcessingResource
ExecutionException
public Set lookup(String singleItem)
singleItem
- a single string to be looked up by the gazetteer
public boolean remove(String singleItem)
Gazetteer
public boolean add(String singleItem, Lookup lookup)
Gazetteer
lookup
- the lookup to be associated with the new string
|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |