GATE
Version 3.1-2270

gate.corpora
Class CorpusImpl

java.lang.Object
  extended by gate.util.AbstractFeatureBearer
      extended by gate.creole.AbstractResource
          extended by gate.creole.AbstractLanguageResource
              extended by gate.corpora.CorpusImpl
All Implemented Interfaces:
Corpus, CreoleListener, LanguageResource, Resource, SimpleCorpus, FeatureBearer, NameBearer, Serializable, Iterable, Collection, EventListener, List
Direct Known Subclasses:
DatabaseCorpusImpl

public class CorpusImpl
extends AbstractLanguageResource
implements Corpus, CreoleListener

Corpora are sets of Document. They are ordered by lexicographic collation on Url.

See Also:
Serialized Form

Nested Class Summary
protected  class CorpusImpl.VerboseList
          A proxy list that stores the actual data in an internal list and forwards all operations to that one but it also fires the appropiate corpus events when necessary.
 
Field Summary
protected  List documentsList
           
protected  List supportList
          The underlying list that holds the documents in this corpus.
 
Fields inherited from class gate.creole.AbstractLanguageResource
dataStore, lrPersistentId
 
Fields inherited from class gate.creole.AbstractResource
name
 
Fields inherited from class gate.util.AbstractFeatureBearer
features
 
Fields inherited from interface gate.SimpleCorpus
CORPUS_DOCLIST_PARAMETER_NAME, CORPUS_NAME_PARAMETER_NAME
 
Constructor Summary
CorpusImpl()
           
 
Method Summary
 void add(int index, Object element)
           
 boolean add(Object o)
           
 boolean addAll(Collection c)
           
 boolean addAll(int index, Collection c)
           
 void addCorpusListener(CorpusListener l)
          Registers a new CorpusListener with this corpus.
 void cleanup()
          Construction
 void clear()
           
protected  void clearDocList()
           
 boolean contains(Object o)
           
 boolean containsAll(Collection c)
           
 void datastoreClosed(CreoleEvent e)
          Called when a DataStore has been closed
 void datastoreCreated(CreoleEvent e)
          Called when a DataStore has been created
 void datastoreOpened(CreoleEvent e)
          Called when a DataStore has been opened
 boolean equals(Object o)
           
protected  void fireDocumentAdded(CorpusEvent e)
           
protected  void fireDocumentRemoved(CorpusEvent e)
           
 Object get(int index)
           
 String getDocumentName(int index)
          Gets the name of a document in this corpus.
 List getDocumentNames()
          Gets the names of the documents in this corpus.
 List getDocumentsList()
           
 int hashCode()
           
 int indexOf(Object o)
           
 Resource init()
          Initialise this resource, and return it.
 boolean isDocumentLoaded(int index)
          This method returns true when the document is already loaded in memory
 boolean isEmpty()
           
 Iterator iterator()
           
 int lastIndexOf(Object o)
           
 ListIterator listIterator()
           
 ListIterator listIterator(int index)
           
static void populate(Corpus corpus, URL directory, FileFilter filter, String encoding, boolean recurseDirectories)
          Fills the provided corpus with documents created on the fly from selected files in a directory.
 void populate(URL directory, FileFilter filter, String encoding, boolean recurseDirectories)
          Fills this corpus with documents created from files in a directory.
 Object remove(int index)
           
 boolean remove(Object o)
           
 boolean removeAll(Collection c)
           
 void removeCorpusListener(CorpusListener l)
          Removes one of the listeners registered with this corpus.
 void resourceLoaded(CreoleEvent e)
          Called when a new Resource has been loaded into the system
 void resourceRenamed(Resource resource, String oldName, String newName)
          Called when the creole register has renamed a resource.1
 void resourceUnloaded(CreoleEvent e)
          Called when a Resource has been removed from the system
 boolean retainAll(Collection c)
           
 Object set(int index, Object element)
           
 void setDocumentsList(List documentsList)
           
 int size()
           
 List subList(int fromIndex, int toIndex)
           
 Object[] toArray()
           
 Object[] toArray(Object[] a)
           
 void unloadDocument(Document doc)
          This method does not make sense for transient corpora, so it does nothing.
 
Methods inherited from class gate.creole.AbstractLanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class gate.util.AbstractFeatureBearer
getFeatures, setFeatures
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.LanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from interface gate.Resource
getParameterValue, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 

Field Detail

supportList

protected List supportList
The underlying list that holds the documents in this corpus.


documentsList

protected transient List documentsList
Constructor Detail

CorpusImpl

public CorpusImpl()
Method Detail

getDocumentNames

public List getDocumentNames()
Gets the names of the documents in this corpus.

Specified by:
getDocumentNames in interface SimpleCorpus
Returns:
a CorpusImpl.VerboseList of Strings representing the names of the documents in this corpus.

getDocumentName

public String getDocumentName(int index)
Gets the name of a document in this corpus.

Specified by:
getDocumentName in interface SimpleCorpus
Parameters:
index - the index of the document
Returns:
a String value representing the name of the document at index in this corpus.

unloadDocument

public void unloadDocument(Document doc)
This method does not make sense for transient corpora, so it does nothing.

Specified by:
unloadDocument in interface Corpus
Parameters:
doc - Document to be unloaded from memory.

isDocumentLoaded

public boolean isDocumentLoaded(int index)
This method returns true when the document is already loaded in memory

Specified by:
isDocumentLoaded in interface Corpus

clearDocList

protected void clearDocList()

size

public int size()
Specified by:
size in interface Collection
Specified by:
size in interface List

isEmpty

public boolean isEmpty()
Specified by:
isEmpty in interface Collection
Specified by:
isEmpty in interface List

contains

public boolean contains(Object o)
Specified by:
contains in interface Collection
Specified by:
contains in interface List

iterator

public Iterator iterator()
Specified by:
iterator in interface Iterable
Specified by:
iterator in interface Collection
Specified by:
iterator in interface List

toArray

public Object[] toArray()
Specified by:
toArray in interface Collection
Specified by:
toArray in interface List

toArray

public Object[] toArray(Object[] a)
Specified by:
toArray in interface Collection
Specified by:
toArray in interface List

add

public boolean add(Object o)
Specified by:
add in interface Collection
Specified by:
add in interface List

remove

public boolean remove(Object o)
Specified by:
remove in interface Collection
Specified by:
remove in interface List

containsAll

public boolean containsAll(Collection c)
Specified by:
containsAll in interface Collection
Specified by:
containsAll in interface List

addAll

public boolean addAll(Collection c)
Specified by:
addAll in interface Collection
Specified by:
addAll in interface List

addAll

public boolean addAll(int index,
                      Collection c)
Specified by:
addAll in interface List

removeAll

public boolean removeAll(Collection c)
Specified by:
removeAll in interface Collection
Specified by:
removeAll in interface List

retainAll

public boolean retainAll(Collection c)
Specified by:
retainAll in interface Collection
Specified by:
retainAll in interface List

clear

public void clear()
Specified by:
clear in interface Collection
Specified by:
clear in interface List

equals

public boolean equals(Object o)
Specified by:
equals in interface Collection
Specified by:
equals in interface List
Overrides:
equals in class Object

hashCode

public int hashCode()
Specified by:
hashCode in interface Collection
Specified by:
hashCode in interface List
Overrides:
hashCode in class Object

get

public Object get(int index)
Specified by:
get in interface List

set

public Object set(int index,
                  Object element)
Specified by:
set in interface List

add

public void add(int index,
                Object element)
Specified by:
add in interface List

remove

public Object remove(int index)
Specified by:
remove in interface List

indexOf

public int indexOf(Object o)
Specified by:
indexOf in interface List

lastIndexOf

public int lastIndexOf(Object o)
Specified by:
lastIndexOf in interface List

listIterator

public ListIterator listIterator()
Specified by:
listIterator in interface List

listIterator

public ListIterator listIterator(int index)
Specified by:
listIterator in interface List

subList

public List subList(int fromIndex,
                    int toIndex)
Specified by:
subList in interface List

cleanup

public void cleanup()
Construction

Specified by:
cleanup in interface Resource
Overrides:
cleanup in class AbstractLanguageResource

init

public Resource init()
Initialise this resource, and return it.

Specified by:
init in interface Resource
Overrides:
init in class AbstractResource

populate

public static void populate(Corpus corpus,
                            URL directory,
                            FileFilter filter,
                            String encoding,
                            boolean recurseDirectories)
                     throws IOException
Fills the provided corpus with documents created on the fly from selected files in a directory. Uses a FileFilter to select which files will be used and which will be ignored. A simple file filter based on extensions is provided in the Gate distribution (ExtensionFileFilter).

Parameters:
corpus - the corpus to be populated
directory - the directory from which the files will be picked. This parameter is an URL for uniformity. It needs to be a URL of type file otherwise an InvalidArgumentException will be thrown.
filter - the file filter used to select files from the target directory. If the filter is null all the files will be accepted.
encoding - the encoding to be used for reading the documents
recurseDirectories - should the directory be parsed recursively?. If true all the files from the provided directory and all its children directories (on as many levels as necessary) will be picked if accepted by the filter otherwise the children directories will be ignored.
Throws:
IOException

populate

public void populate(URL directory,
                     FileFilter filter,
                     String encoding,
                     boolean recurseDirectories)
              throws IOException,
                     ResourceInstantiationException
Fills this corpus with documents created from files in a directory.

Specified by:
populate in interface SimpleCorpus
Parameters:
filter - the file filter used to select files from the target directory. If the filter is null all the files will be accepted.
directory - the directory from which the files will be picked. This parameter is an URL for uniformity. It needs to be a URL of type file otherwise an InvalidArgumentException will be thrown. An implementation for this method is provided as a static method at populate(Corpus, URL, FileFilter, String, boolean).
encoding - the encoding to be used for reading the documents
recurseDirectories - should the directory be parsed recursively?. If true all the files from the provided directory and all its children directories (on as many levels as necessary) will be picked if accepted by the filter otherwise the children directories will be ignored.
Throws:
IOException
ResourceInstantiationException

removeCorpusListener

public void removeCorpusListener(CorpusListener l)
Description copied from interface: Corpus
Removes one of the listeners registered with this corpus.

Specified by:
removeCorpusListener in interface Corpus
Parameters:
l - the listener to be removed.

addCorpusListener

public void addCorpusListener(CorpusListener l)
Description copied from interface: Corpus
Registers a new CorpusListener with this corpus.

Specified by:
addCorpusListener in interface Corpus
Parameters:
l - the listener to be added.

fireDocumentAdded

protected void fireDocumentAdded(CorpusEvent e)

fireDocumentRemoved

protected void fireDocumentRemoved(CorpusEvent e)

setDocumentsList

public void setDocumentsList(List documentsList)

getDocumentsList

public List getDocumentsList()

resourceLoaded

public void resourceLoaded(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a new Resource has been loaded into the system

Specified by:
resourceLoaded in interface CreoleListener

resourceUnloaded

public void resourceUnloaded(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a Resource has been removed from the system

Specified by:
resourceUnloaded in interface CreoleListener

resourceRenamed

public void resourceRenamed(Resource resource,
                            String oldName,
                            String newName)
Description copied from interface: CreoleListener
Called when the creole register has renamed a resource.1

Specified by:
resourceRenamed in interface CreoleListener

datastoreOpened

public void datastoreOpened(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a DataStore has been opened

Specified by:
datastoreOpened in interface CreoleListener

datastoreCreated

public void datastoreCreated(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a DataStore has been created

Specified by:
datastoreCreated in interface CreoleListener

datastoreClosed

public void datastoreClosed(CreoleEvent e)
Description copied from interface: CreoleListener
Called when a DataStore has been closed

Specified by:
datastoreClosed in interface CreoleListener

GATE
Version 3.1-2270