| 
 | GATE Version 3.1-2270 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractLanguageResource
gate.corpora.DocumentImpl
public class DocumentImpl
Represents the commonalities between all sorts of documents.
The DocumentImpl class implements the Document interface. The DocumentContentImpl class models the textual or audio-visual materials which are the source and content of Documents. The AnnotationSetImpl class supplies annotations on Documents.
Abbreviations:
We add an edit method to each of these classes; for DC and AS the methods are package private; D has the public method.
void edit(Long start, Long end, DocumentContent replacement) throws InvalidOffsetException;
D receives edit requests and forwards them to DC and AS. On DC, this method makes a change to the content - e.g. replacing a String range from start to end with replacement. (Deletions are catered for by having replacement = null.) D then calls AS.edit on each of its annotation sets.
On AS, edit calls replacement.size() (i.e. DC.size()) to figure out how long the replacement is (0 for null). It then considers annotations that terminate (start or end) in the altered or deleted range as invalid; annotations that terminate after the range have their offsets adjusted. I.e.:
A note re. AS and annotations: annotations no longer have offsets as in the old model, they now have nodes, and nodes have offsets.
To implement AS.edit, we have several indices:
HashMap annotsByStartNode, annotsByEndNode;which map node ids to annotations;
RBTreeMap nodesByOffset;which maps offset to Nodes.
When we get an edit request, we traverse that part of the nodesByOffset tree representing the altered or deleted range of the DC. For each node found, we delete any annotations that terminate on the node, and then delete the node itself. We then traverse the rest of the tree, changing the offset on all remaining nodes by:
   newOffset =
     oldOffset -
     (
       (end - start) -                                     // size of mod
       ( (replacement == null) ? 0 : replacement.size() )  // size of repl
     );
 
 Note that we use the same convention as e.g. java.lang.String: start
 offsets are inclusive; end offsets are exclusive. I.e. for string "abcd"
 range 1-3 = "bc". Examples, for a node with offset 4:
 edit(1, 3, "BC"); newOffset = 4 - ( (3 - 1) - 2 ) = 4 edit(1, 3, null); newOffset = 4 - ( (3 - 1) - 0 ) = 2 edit(1, 3, "BBCC"); newOffset = 4 - ( (3 - 1) - 4 ) = 6
| Field Summary | |
|---|---|
| protected  DocumentContent | contentThe content of the document | 
| protected  AnnotationSet | defaultAnnotsThe default annotation set | 
| protected  String | encodingThe encoding of the source of the document content | 
| protected  Boolean | markupAwareIs the document markup-aware? | 
| protected  Map | namedAnnotSetsNamed sets of annotations | 
| protected  int | nextAnnotationIdThe id of the next new annotation | 
| protected  int | nextNodeIdThe id of the next new node | 
| protected  URL | sourceUrlThe source URL | 
| protected  Long | sourceUrlEndOffsetThe end of the range that the content comes from at the source URL (or null if none). | 
| protected  Long | sourceUrlStartOffsetThe start of the range that the content comes from at the source URL (or null if none). | 
| Fields inherited from class gate.creole.AbstractLanguageResource | 
|---|
| dataStore, lrPersistentId | 
| Fields inherited from class gate.creole.AbstractResource | 
|---|
| name | 
| Fields inherited from class gate.util.AbstractFeatureBearer | 
|---|
| features | 
| Fields inherited from interface gate.SimpleDocument | 
|---|
| DOCUMENT_URL_PARAMETER_NAME | 
| Constructor Summary | |
|---|---|
| DocumentImpl()Default construction. | |
| Method Summary | |
|---|---|
|  void | addDocumentListener(DocumentListener l)Adds a DocumentListenerto this document. | 
|  void | cleanup()Clear all the data members of the object. | 
|  int | compareTo(Object o)Ordering based on URL.toString() and the URL offsets (if any) | 
|  void | datastoreClosed(CreoleEvent e)Called when a DataStorehas been closed | 
|  void | datastoreCreated(CreoleEvent e)Called when a DataStorehas been created | 
|  void | datastoreOpened(CreoleEvent e)Called when a DataStorehas been opened | 
|  void | edit(Long start,
     Long end,
     DocumentContent replacement)Propagate edit changes to the document content and annotations. | 
| protected  void | fireAnnotationSetAdded(DocumentEvent e) | 
| protected  void | fireAnnotationSetRemoved(DocumentEvent e) | 
| protected  void | fireContentEdited(DocumentEvent e) | 
|  AnnotationSet | getAnnotations()Get the default set of annotations. | 
|  AnnotationSet | getAnnotations(String name)Get a named set of annotations. | 
|  Set | getAnnotationSetNames()Returns a set of all named annotation sets in existence | 
|  Boolean | getCollectRepositioningInfo()Get the collectiong and preserving of repositioning information for the Document. | 
|  DocumentContent | getContent()The content of the document: a String for text; MPEG for video; etc. | 
|  String | getEncoding()Get the encoding of the document content source | 
|  FeatureMap | getFeatures()Cover unpredictable Features creation | 
|  Boolean | getMarkupAware()Get the markup awareness status of the Document. | 
|  Map | getNamedAnnotationSets()Returns a map with the named annotation sets. | 
|  Integer | getNextAnnotationId()Generate and return the next annotation ID | 
|  Integer | getNextNodeId()Generate and return the next node ID | 
| protected  String | getOrderingString()Utility method to produce a string for comparison in ordering. | 
|  Boolean | getPreserveOriginalContent()Get the preserving of content status of the Document. | 
|  URL | getSourceUrl()Documents are identified by URLs | 
|  Long | getSourceUrlEndOffset()Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. | 
|  Long[] | getSourceUrlOffsets()Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. | 
|  Long | getSourceUrlStartOffset()Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. | 
|  String | getStringContent()The stringContent of a document is a property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL. | 
|  Resource | init()Initialise this resource, and return it. | 
|  boolean | isValidOffset(Long offset)Check that an offset is valid, i.e. it is non-null, greater than or equal to 0 and less than the size of the document content. | 
|  boolean | isValidOffsetRange(Long start,
                   Long end)Check that both start and end are valid offsets and that they constitute a valid offset range, i.e. start is greater than or equal to long. | 
| static boolean | isXmlChar(char ch)This method decide if a char is a valid XML one or not | 
|  void | removeAnnotationSet(String name)Removes one of the named annotation sets. | 
|  void | removeDocumentListener(DocumentListener l)Removes one of the previously registered document listeners. | 
|  void | resourceAdopted(DatastoreEvent evt)Called by a datastore when a new resource has been adopted | 
|  void | resourceDeleted(DatastoreEvent evt)Called by a datastore when a resource has been deleted | 
|  void | resourceLoaded(CreoleEvent e)Called when a new Resourcehas been loaded into the system | 
|  void | resourceRenamed(Resource resource,
                String oldName,
                String newName)Called when the creole register has renamed a resource.1 | 
|  void | resourceUnloaded(CreoleEvent e)Called when a Resourcehas been removed from the system | 
|  void | resourceWritten(DatastoreEvent evt)Called by a datastore when a resource has been wrote into the datastore | 
|  void | setCollectRepositioningInfo(Boolean b)Allow/disallow collecting of repositioning information. | 
|  void | setContent(DocumentContent content)Set method for the document content | 
|  void | setDataStore(DataStore dataStore)Set the data store that this LR lives in. | 
|  void | setDefaultAnnotations(AnnotationSet defaultAnnotations)This method added by Shafirin Andrey, to allow access to protected member defaultAnnotsRequired for JAPE-Debugger. | 
|  void | setEncoding(String encoding)Set the encoding of the document content source | 
|  void | setLRPersistenceId(Object lrID)Sets the persistence id of this LR. | 
|  void | setMarkupAware(Boolean newMarkupAware)Make the document markup-aware. | 
|  void | setNextAnnotationId(int aNextAnnotationId)Sets the nextAnnotationId | 
|  void | setPreserveOriginalContent(Boolean b)Allow/disallow preserving of the original document content. | 
|  void | setSourceUrl(URL sourceUrl)Set method for the document's URL | 
|  void | setSourceUrlEndOffset(Long sourceUrlEndOffset)Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. | 
|  void | setSourceUrlStartOffset(Long sourceUrlStartOffset)Documents may be packed within files; in this case an optional pair of offsets refer to the location of the document. | 
|  void | setStringContent(String stringContent)The stringContent of a document is a property of the document that will be set when the user wants to create the document from a string, as opposed to from a URL. | 
|  String | toString()String respresentation | 
|  String | toXml()Returns a GateXml document that is a custom XML format for wich there is a reader inside GATE called gate.xml.GateFormatXmlHandler. | 
|  String | toXml(Set aSourceAnnotationSet)Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet. | 
|  String | toXml(Set aSourceAnnotationSet,
      boolean includeFeatures)Returns an XML document aming to preserve the original markups( the original markup will be in the same place and format as it was before processing the document) and include (if possible) the annotations specified in the aSourceAnnotationSet. | 
| Methods inherited from class gate.creole.AbstractLanguageResource | 
|---|
| getDataStore, getLRPersistenceId, getParent, isModified, setParent, sync | 
| Methods inherited from class gate.creole.AbstractResource | 
|---|
| checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners | 
| Methods inherited from class gate.util.AbstractFeatureBearer | 
|---|
| setFeatures | 
| Methods inherited from class java.lang.Object | 
|---|
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait | 
| Methods inherited from interface gate.LanguageResource | 
|---|
| getDataStore, getLRPersistenceId, getParent, isModified, setParent, sync | 
| Methods inherited from interface gate.Resource | 
|---|
| getParameterValue, setParameterValue, setParameterValues | 
| Methods inherited from interface gate.util.FeatureBearer | 
|---|
| setFeatures | 
| Methods inherited from interface gate.util.NameBearer | 
|---|
| getName, setName | 
| Field Detail | 
|---|
protected int nextAnnotationId
protected int nextNodeId
protected URL sourceUrl
protected DocumentContent content
protected String encoding
protected Long sourceUrlStartOffset
protected Long sourceUrlEndOffset
protected AnnotationSet defaultAnnots
protected Map namedAnnotSets
protected Boolean markupAware
| Constructor Detail | 
|---|
public DocumentImpl()
| Method Detail | 
|---|
public FeatureMap getFeatures()
getFeatures in interface FeatureBearergetFeatures in class AbstractFeatureBearer
public Resource init()
              throws ResourceInstantiationException
init in interface Resourceinit in class AbstractResourceResourceInstantiationExceptionpublic void cleanup()
cleanup in interface Resourcecleanup in class AbstractLanguageResourcepublic URL getSourceUrl()
getSourceUrl in interface SimpleDocumentpublic void setSourceUrl(URL sourceUrl)
setSourceUrl in interface SimpleDocumentpublic Long[] getSourceUrlOffsets()
getSourceUrlOffsets in interface Documentpublic void setPreserveOriginalContent(Boolean b)
setPreserveOriginalContent in interface Documentpublic Boolean getPreserveOriginalContent()
getPreserveOriginalContent in interface Documentpublic void setCollectRepositioningInfo(Boolean b)
setCollectRepositioningInfo in interface Documentpublic Boolean getCollectRepositioningInfo()
getCollectRepositioningInfo in interface Documentpublic Long getSourceUrlStartOffset()
getSourceUrlStartOffset in interface Documentpublic void setSourceUrlStartOffset(Long sourceUrlStartOffset)
setSourceUrlStartOffset in interface Documentpublic Long getSourceUrlEndOffset()
getSourceUrlEndOffset in interface Documentpublic void setSourceUrlEndOffset(Long sourceUrlEndOffset)
setSourceUrlEndOffset in interface Documentpublic DocumentContent getContent()
getContent in interface SimpleDocumentpublic void setContent(DocumentContent content)
setContent in interface SimpleDocumentpublic String getEncoding()
getEncoding in interface TextualDocumentpublic void setEncoding(String encoding)
public AnnotationSet getAnnotations()
getAnnotations in interface SimpleDocumentpublic AnnotationSet getAnnotations(String name)
getAnnotations in interface SimpleDocumentpublic void setMarkupAware(Boolean newMarkupAware)
setMarkupAware in interface DocumentnewMarkupAware - markup awareness status.public Boolean getMarkupAware()
getMarkupAware in interface Documentpublic String toXml(Set aSourceAnnotationSet)
toXml in interface Document
public String toXml(Set aSourceAnnotationSet,
                    boolean includeFeatures)
toXml in interface DocumentaSourceAnnotationSet - is an annotation set containing all the
 annotations that will be combined with the original marup set. If the
 param is null it will only dump the original markups.includeFeatures - is a boolean that controls whether the annotation
 features should be included or not. If false, only the annotation type
 is included in the tag.
public String toXml()
toXml in interface Documentpublic static boolean isXmlChar(char ch)
ch - the char to be tested
public Map getNamedAnnotationSets()
null
  if no named annotaton set exists.
getNamedAnnotationSets in interface Documentpublic Set getAnnotationSetNames()
getAnnotationSetNames in interface SimpleDocumentpublic void removeAnnotationSet(String name)
removeAnnotationSet in interface SimpleDocumentname - the name of the annotation set to be removed
public void edit(Long start,
                 Long end,
                 DocumentContent replacement)
          throws InvalidOffsetException
edit in interface DocumentInvalidOffsetExceptionpublic boolean isValidOffset(Long offset)
public boolean isValidOffsetRange(Long start,
                                  Long end)
public void setNextAnnotationId(int aNextAnnotationId)
public Integer getNextAnnotationId()
public Integer getNextNodeId()
public int compareTo(Object o)
              throws ClassCastException
compareTo in interface ComparableClassCastExceptionprotected String getOrderingString()
public String getStringContent()
public void setStringContent(String stringContent)
public String toString()
toString in class Objectpublic void removeDocumentListener(DocumentListener l)
Document
removeDocumentListener in interface Documentpublic void addDocumentListener(DocumentListener l)
DocumentDocumentListener to this document.
 All the registered listeners will be notified of changes occured to the
 document.
addDocumentListener in interface Documentprotected void fireAnnotationSetAdded(DocumentEvent e)
protected void fireAnnotationSetRemoved(DocumentEvent e)
protected void fireContentEdited(DocumentEvent e)
public void resourceLoaded(CreoleEvent e)
CreoleListenerResource has been loaded into the system
resourceLoaded in interface CreoleListenerpublic void resourceUnloaded(CreoleEvent e)
CreoleListenerResource has been removed from the system
resourceUnloaded in interface CreoleListenerpublic void datastoreOpened(CreoleEvent e)
CreoleListenerDataStore has been opened
datastoreOpened in interface CreoleListenerpublic void datastoreCreated(CreoleEvent e)
CreoleListenerDataStore has been created
datastoreCreated in interface CreoleListener
public void resourceRenamed(Resource resource,
                            String oldName,
                            String newName)
CreoleListener
resourceRenamed in interface CreoleListenerpublic void datastoreClosed(CreoleEvent e)
CreoleListenerDataStore has been closed
datastoreClosed in interface CreoleListenerpublic void setLRPersistenceId(Object lrID)
AbstractLanguageResource
setLRPersistenceId in interface LanguageResourcesetLRPersistenceId in class AbstractLanguageResourcepublic void resourceAdopted(DatastoreEvent evt)
DatastoreListener
resourceAdopted in interface DatastoreListenerpublic void resourceDeleted(DatastoreEvent evt)
DatastoreListener
resourceDeleted in interface DatastoreListenerpublic void resourceWritten(DatastoreEvent evt)
DatastoreListener
resourceWritten in interface DatastoreListener
public void setDataStore(DataStore dataStore)
                  throws PersistenceException
AbstractLanguageResource
setDataStore in interface LanguageResourcesetDataStore in class AbstractLanguageResourcePersistenceExceptionpublic void setDefaultAnnotations(AnnotationSet defaultAnnotations)
defaultAnnots
 Required for JAPE-Debugger.
| 
 | GATE Version 3.1-2270 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||