gate
Interface SimpleCorpus
- All Superinterfaces:
- Collection, FeatureBearer, Iterable, LanguageResource, List, NameBearer, Resource, Serializable
- All Known Subinterfaces:
- Corpus, IndexedCorpus
- All Known Implementing Classes:
- CorpusImpl, DatabaseCorpusImpl, SerialCorpusImpl
public interface SimpleCorpus
- extends LanguageResource, List, NameBearer
Corpora are lists of Document. TIPSTER equivalent: Collection.
Method Summary |
String |
getDocumentName(int index)
Gets the name of a document in this corpus. |
List |
getDocumentNames()
Gets the names of the documents in this corpus. |
void |
populate(URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
Fills this corpus with documents created on the fly from selected files in
a directory. |
Methods inherited from interface java.util.List |
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray |
CORPUS_NAME_PARAMETER_NAME
static final String CORPUS_NAME_PARAMETER_NAME
- See Also:
- Constant Field Values
CORPUS_DOCLIST_PARAMETER_NAME
static final String CORPUS_DOCLIST_PARAMETER_NAME
- See Also:
- Constant Field Values
getDocumentNames
List getDocumentNames()
- Gets the names of the documents in this corpus.
- Returns:
- a
List
of Strings representing the names of the documents
in this corpus.
getDocumentName
String getDocumentName(int index)
- Gets the name of a document in this corpus.
- Parameters:
index
- the index of the document
- Returns:
- a String value representing the name of the document at
index in this corpus.
populate
void populate(URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
throws IOException,
ResourceInstantiationException
- Fills this corpus with documents created on the fly from selected files in
a directory. Uses a
FileFilter
to select which files will be used
and which will be ignored.
A simple file filter based on extensions is provided in the Gate
distribution (ExtensionFileFilter
).
- Parameters:
directory
- the directory from which the files will be picked. This
parameter is an URL for uniformity. It needs to be a URL of type file
otherwise an InvalidArgumentException will be thrown.
An implementation for this method is provided as a static method at
CorpusImpl.populate(Corpus, URL, FileFilter, String, boolean)
.filter
- the file filter used to select files from the target
directory. If the filter is null all the files will be accepted.encoding
- the encoding to be used for reading the documentsrecurseDirectories
- should the directory be parsed recursively?. If
true all the files from the provided directory and all its
children directories (on as many levels as necessary) will be picked if
accepted by the filter otherwise the children directories will be ignored.
- Throws:
IOException
ResourceInstantiationException