|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractLanguageResource
gate.DocumentFormat
public abstract class DocumentFormat
The format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.
Field Summary | |
---|---|
protected Map |
element2StringMap
This map is used inside uppackMarkup() method... |
protected static boolean |
isGateXmlDocument
This fields indicates whether the document being processed is in a Gate XML custom format. |
protected static Map |
magic2mimeTypeMap
Map of Set of magic numbers to MimeType. |
protected Map |
markupElementsMap
Map of markup elements to annotation types. |
protected static Map |
mimeString2ClassHandlerMap
Map of MimeTypeString to ClassHandler class. |
protected static Map |
mimeString2mimeTypeMap
Map of MimeType to DocumentFormat Class. |
protected static Map |
suffixes2mimeTypeMap
Map of Set of file suffixes to MimeType. |
Fields inherited from class gate.creole.AbstractLanguageResource |
---|
dataStore, lrPersistentId |
Fields inherited from class gate.creole.AbstractResource |
---|
name |
Constructor Summary | |
---|---|
DocumentFormat()
Default construction |
Method Summary | |
---|---|
void |
addStatusListener(StatusListener l)
|
protected static boolean |
areEqual(MimeType aMimeType,
MimeType anotherMimeType)
Tests if two MimeType objects are equal. |
protected static MimeType |
decideBetweenThreeMimeTypes(MimeType aMimeTypeFromWebServer,
MimeType aMimeTypeFromFileSuffix,
MimeType aMimeTypeFromMagicNumbers)
This method decides what mimeType is in majority |
protected static MimeType |
decideBetweenTwoMimeTypes(MimeType aMimeType,
MimeType anotherMimeType)
Decide between two mimeTypes. |
protected void |
fireStatusChanged(String e)
|
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
MimeType mimeType)
Find a DocumentFormat implementation that deals with a particular MIME type, given that type. |
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
String fileSuffix)
Find a DocumentFormat implementation that deals with a particular MIME type, given the file suffix (e.g. ".txt") that the document came from. |
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
URL url)
Find a DocumentFormat implementation that deals with a particular MIME type, given the URL of the Document. |
Map |
getElement2StringMap()
Get the element 2 string map |
FeatureMap |
getFeatures()
Get the feature set |
Map |
getMarkupElementsMap()
Get the markup elements map |
MimeType |
getMimeType()
Gets the mime Type |
Boolean |
getShouldCollectRepositioning()
|
protected static MimeType |
guessTypeUsingMagicNumbers(InputStream aInputStream,
String anEncoding)
This method tries to guess the mime Type using some magic numbers. |
void |
removeStatusListener(StatusListener l)
|
protected static MimeType |
runMagicNumbers(InputStreamReader aReader)
Performs magic over Gate Document |
void |
setElement2StringMap(Map anElement2StringMap)
Set the element 2 string map |
void |
setFeatures(FeatureMap features)
Set the features map |
void |
setMarkupElementsMap(Map markupElementsMap)
Set the markup elements map |
void |
setMimeType(MimeType aMimeType)
Set the mime type |
void |
setShouldCollectRepositioning(Boolean b)
|
Boolean |
supportsRepositioning()
If the document format could collect repositioning information during the unpack phase this method will return true. |
abstract void |
unpackMarkup(Document doc)
Unpack the markup in the document. |
abstract void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
|
void |
unpackMarkup(Document doc,
String originalContentFeatureType)
Unpack the markup in the document. |
Methods inherited from class gate.creole.AbstractLanguageResource |
---|
cleanup, getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
Methods inherited from class gate.creole.AbstractResource |
---|
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, init, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface gate.LanguageResource |
---|
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
Methods inherited from interface gate.Resource |
---|
cleanup, getParameterValue, init, setParameterValue, setParameterValues |
Methods inherited from interface gate.util.NameBearer |
---|
getName, setName |
Field Detail |
---|
protected static boolean isGateXmlDocument
protected static Map mimeString2ClassHandlerMap
protected static Map mimeString2mimeTypeMap
protected static Map suffixes2mimeTypeMap
protected static Map magic2mimeTypeMap
protected Map markupElementsMap
protected Map element2StringMap
Constructor Detail |
---|
public DocumentFormat()
Method Detail |
---|
public Boolean supportsRepositioning()
public void setShouldCollectRepositioning(Boolean b)
public Boolean getShouldCollectRepositioning()
public abstract void unpackMarkup(Document doc) throws DocumentFormatException
DocumentFormatException
public abstract void unpackMarkup(Document doc, RepositioningInfo repInfo, RepositioningInfo ampCodingInfo) throws DocumentFormatException
DocumentFormatException
public void unpackMarkup(Document doc, String originalContentFeatureType) throws DocumentFormatException
doc
- the document that will be upackedoriginalContentFeatureType
- the name of the feature that will hold
the document's content.
DocumentFormatException
protected static MimeType decideBetweenThreeMimeTypes(MimeType aMimeTypeFromWebServer, MimeType aMimeTypeFromFileSuffix, MimeType aMimeTypeFromMagicNumbers)
aMimeTypeFromWebServer
- a MimeTypeaMimeTypeFromFileSuffix
- a MimeTypeaMimeTypeFromMagicNumbers
- a MimeType
protected static MimeType decideBetweenTwoMimeTypes(MimeType aMimeType, MimeType anotherMimeType)
aMimeType
- a MimeType object with "Prority" parameter setanotherMimeType
- a MimeType object with "Prority" parameter set
protected static boolean areEqual(MimeType aMimeType, MimeType anotherMimeType)
protected static MimeType guessTypeUsingMagicNumbers(InputStream aInputStream, String anEncoding)
aInputStream
- a InputStream which has to be transformed into a
InputStreamReaderanEncoding
- the encoding. If is null or unknown then a
InputStreamReader with default encodings will be created.
protected static MimeType runMagicNumbers(InputStreamReader aReader)
public static DocumentFormat getDocumentFormat(Document aGateDocument, MimeType mimeType)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypemimeType
- the mime type that is given as inputpublic static DocumentFormat getDocumentFormat(Document aGateDocument, String fileSuffix)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypefileSuffix
- the file suffix that is given as inputpublic static DocumentFormat getDocumentFormat(Document aGateDocument, URL url)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypeurl
- the URL that is given as inputpublic FeatureMap getFeatures()
getFeatures
in interface FeatureBearer
getFeatures
in class AbstractFeatureBearer
public Map getMarkupElementsMap()
public Map getElement2StringMap()
public void setMarkupElementsMap(Map markupElementsMap)
public void setElement2StringMap(Map anElement2StringMap)
public void setFeatures(FeatureMap features)
setFeatures
in interface FeatureBearer
setFeatures
in class AbstractFeatureBearer
public void setMimeType(MimeType aMimeType)
public MimeType getMimeType()
public void removeStatusListener(StatusListener l)
public void addStatusListener(StatusListener l)
protected void fireStatusChanged(String e)
|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |