gate
Interface SimpleCorpus
- All Superinterfaces: 
 - Collection, FeatureBearer, Iterable, LanguageResource, List, NameBearer, Resource, Serializable
 
- All Known Subinterfaces: 
 - Corpus, IndexedCorpus
 
- All Known Implementing Classes: 
 - CorpusImpl, DatabaseCorpusImpl, SerialCorpusImpl
 
public interface SimpleCorpus
- extends LanguageResource, List, NameBearer
 
Corpora are lists of Document. TIPSTER equivalent: Collection.
 
| 
Method Summary | 
 String | 
getDocumentName(int index)
 
          Gets the name of a document in this corpus. | 
 List | 
getDocumentNames()
 
          Gets the names of the documents in this corpus. | 
 void | 
populate(URL directory,
         FileFilter filter,
         String encoding,
         boolean recurseDirectories)
 
          Fills this corpus with documents created on the fly from selected files in
 a directory. | 
 
 
 
 
 
| Methods inherited from interface java.util.List | 
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray | 
 
CORPUS_NAME_PARAMETER_NAME
static final String CORPUS_NAME_PARAMETER_NAME
- See Also:
 - Constant Field Values
 
CORPUS_DOCLIST_PARAMETER_NAME
static final String CORPUS_DOCLIST_PARAMETER_NAME
- See Also:
 - Constant Field Values
 
getDocumentNames
List getDocumentNames()
- Gets the names of the documents in this corpus.
 
- Returns:
 - a 
List of Strings representing the names of the documents
 in this corpus. 
 
 
getDocumentName
String getDocumentName(int index)
- Gets the name of a document in this corpus.
 
- Parameters:
 index - the index of the document
- Returns:
 - a String value representing the name of the document at
 index in this corpus.
 
 
 
populate
void populate(URL directory,
              FileFilter filter,
              String encoding,
              boolean recurseDirectories)
              throws IOException,
                     ResourceInstantiationException
- Fills this corpus with documents created on the fly from selected files in
 a directory. Uses a 
FileFilter to select which files will be used
 and which will be ignored.
 A simple file filter based on extensions is provided in the Gate
 distribution (ExtensionFileFilter).
 
- Parameters:
 directory - the directory from which the files will be picked. This
 parameter is an URL for uniformity. It needs to be a URL of type file
 otherwise an InvalidArgumentException will be thrown.
 An implementation for this method is provided as a static method at
 CorpusImpl.populate(Corpus, URL, FileFilter, String, boolean).filter - the file filter used to select files from the target
 directory. If the filter is null all the files will be accepted.encoding - the encoding to be used for reading the documentsrecurseDirectories - should the directory be parsed recursively?. If
 true all the files from the provided directory and all its
 children directories (on as many levels as necessary) will be picked if
 accepted by the filter otherwise the children directories will be ignored.
- Throws:
 IOException
ResourceInstantiationException