|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractProcessingResource
gate.creole.AbstractLanguageAnalyser
gate.creole.orthomatcher.OrthoMatcher
public class OrthoMatcher
Nested Class Summary |
---|
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource |
---|
AbstractProcessingResource.InternalStatusListener, AbstractProcessingResource.IntervalProgressListener |
Fields inherited from class gate.creole.AbstractLanguageAnalyser |
---|
corpus, document |
Fields inherited from class gate.creole.AbstractProcessingResource |
---|
interrupted |
Fields inherited from class gate.creole.AbstractResource |
---|
name |
Fields inherited from class gate.util.AbstractFeatureBearer |
---|
features |
Constructor Summary | |
---|---|
OrthoMatcher()
|
Method Summary | |
---|---|
protected String |
containTitle(String annotString,
Annotation annot)
return a person name without title |
protected void |
createAnnotList(String nameFile,
String nameList)
creates the lookup tables |
protected void |
docCleanup()
|
void |
execute()
Run the resource. |
String |
getAnnotationSetName()
get the name of the annotation set |
List |
getAnnotationTypes()
get the types of the annotation |
Boolean |
getCaseSensitive()
Are we running in a case-sensitive mode? |
URL |
getDefinitionFileURL()
|
String |
getEncoding()
|
Boolean |
getExtLists()
|
String |
getOrganizationType()
|
String |
getPersonType()
|
Boolean |
getProcessUnknown()
Return whether or not we're processing the Unknown annots |
Resource |
init()
Initialise this resource, and return it. |
protected boolean |
isUnknownGender(String gender)
|
protected boolean |
matchAnnotations(Annotation newAnnot,
String annotString,
Annotation prevAnnot)
|
protected boolean |
matchedAlready(Annotation annot1,
Annotation annot2)
|
protected void |
matchNameAnnotations()
|
protected boolean |
matchOtherAnnots(List toMatchList,
Annotation newAnnot,
String annotString)
This method checkes whether the new annotation matches all annotations given in the toMatchList (it contains ids) The idea is that the new annotation needs to match all those, because assuming transitivity does not always work, when two different entities share a common token: e.g., BT Cellnet and BT and British Telecom. |
boolean |
matchRule0(String s1,
String s2)
RULE #0: If the two names are listed in table of spurius matches then they do NOT match Condition(s): - Applied to: all name annotations |
boolean |
matchRule1(String s1,
String s2,
boolean matchCase)
RULE #1: If the two names are identical then they are the same no longer used, because I do the check for same string via the hash table of previous annotations Condition(s): depend on case Applied to: all name annotations |
boolean |
matchRule10(String s1,
String s2)
RULE #10: is one name the reverse of the other reversing around prepositions only? |
boolean |
matchRule11(String s1,
String s2)
RULE #11: does one name consist of contractions of the first two tokens of the other name? |
boolean |
matchRule12(String s1,
String s2)
RULE #12: do the first and last tokens of one name match the first and last tokens of the other? |
boolean |
matchRule13(String s1,
String s2)
RULE #13: do multi-word names match except for one token e.g. |
boolean |
matchRule14(String s1,
String s2)
RULE #14: if the last token of one name matches the second name e.g. |
boolean |
matchRule15(String s1,
String s2)
RULE #15: does one token from a Person name appear as the other token Note that this rule has NOT been used in LaSIE's 1.5 namematcher; added for ACE by Di's request |
boolean |
matchRule2(String s1,
String s2)
RULE #2: if the two names are listed as equivalent in the lookup table (alias) then they match Condition(s): - Applied to: all name annotations |
boolean |
matchRule3(String s1,
String s2)
RULE #3: adding a possessive at the end of one name causes a match e.g. |
boolean |
matchRule4(String s1,
String s2)
RULE #4: Do all tokens other than the punctuation marks , and . match? |
boolean |
matchRule5(String s1,
String s2)
RULE #5: if the 1st token of one name matches the second name e.g. |
boolean |
matchRule6(String s1,
String s2)
RULE #6: if one name is the acronym of the other e.g. |
boolean |
matchRule7(String s1,
String s2)
RULE #7: if one of the tokens in one of the names is in the list of separators eg. "&" then check if the token before the separator matches the other name e.g. |
boolean |
matchRule8(String s1,
String s2)
This rule is now obsolete, as The and the trailing CDG are stripped before matching. |
boolean |
matchRule9(String s1,
String s2)
RULE #9: does one of the names match the token just before a trailing company designator in the other name? |
protected void |
matchUnknown()
|
protected void |
matchWithPrevious(Annotation nameAnnot,
String annotString)
|
void |
setAnnotationSetName(String newAnnotationSetName)
set the annotation set name |
void |
setAnnotationTypes(List newType)
set the types of the annotations |
void |
setCaseSensitive(Boolean newCase)
set the caseSensitive flag |
void |
setDefinitionFileURL(URL definitionFileURL)
|
void |
setEncoding(String encoding)
|
void |
setExtLists(Boolean newExtLists)
set the extLists flag |
void |
setOrganizationType(String newOrganizationType)
|
void |
setPersonType(String newPersonType)
|
void |
setProcessUnknown(Boolean processOrNot)
set whether to process the Unknown annotations |
protected String |
stripCDG(String annotString,
Annotation annot)
return an organization without a designator and starting The |
protected void |
updateMatches(Annotation newAnnot,
Annotation prevAnnot)
|
protected Annotation |
updateMatches(Annotation newAnnot,
String annotString)
|
Methods inherited from class gate.creole.AbstractLanguageAnalyser |
---|
getCorpus, getDocument, setCorpus, setDocument |
Methods inherited from class gate.creole.AbstractProcessingResource |
---|
addProgressListener, addStatusListener, cleanup, fireProcessFinished, fireProgressChanged, fireStatusChanged, interrupt, isInterrupted, reInit, removeProgressListener, removeStatusListener |
Methods inherited from class gate.creole.AbstractResource |
---|
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class gate.util.AbstractFeatureBearer |
---|
getFeatures, setFeatures |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface gate.ProcessingResource |
---|
reInit |
Methods inherited from interface gate.Resource |
---|
cleanup, getParameterValue, setParameterValue, setParameterValues |
Methods inherited from interface gate.util.FeatureBearer |
---|
getFeatures, setFeatures |
Methods inherited from interface gate.util.NameBearer |
---|
getName, setName |
Methods inherited from interface gate.Executable |
---|
interrupt, isInterrupted |
Field Detail |
---|
public static final String OM_DOCUMENT_PARAMETER_NAME
public static final String OM_ANN_SET_PARAMETER_NAME
public static final String OM_CASE_SENSITIVE_PARAMETER_NAME
public static final String OM_ANN_TYPES_PARAMETER_NAME
public static final String OM_ORG_TYPE_PARAMETER_NAME
public static final String OM_PERSON_TYPE_PARAMETER_NAME
public static final String OM_EXT_LISTS_PARAMETER_NAME
protected static final String CDGLISTNAME
protected static final String ALIASLISTNAME
protected static final String ARTLISTNAME
protected static final String PREPLISTNAME
protected static final String CONNECTORLISTNAME
protected static final String SPURLISTNAME
protected static final String PUNCTUATION_VALUE
protected static final String THE_VALUE
protected String annotationSetName
protected List annotationTypes
protected String organizationType
protected String personType
protected String unknownType
protected boolean extLists
protected boolean matchingUnknowns
protected boolean caseSensitive
protected FeatureMap queryFM
protected HashMap alias
protected HashSet cdg
protected HashMap spur_match
protected HashMap def_art
protected HashMap connector
protected HashMap prepos
protected AnnotationSet nameAllAnnots
protected HashMap processedAnnots
protected HashMap annots2Remove
protected List matchesDocFeature
protected HashMap tokensMap
protected Annotation shortAnnot
protected Annotation longAnnot
protected ArrayList tokensLongAnnot
protected ArrayList tokensShortAnnot
protected FeatureMap tempMap
Constructor Detail |
---|
public OrthoMatcher()
Method Detail |
---|
public Resource init() throws ResourceInstantiationException
init
in interface Resource
init
in class AbstractProcessingResource
ResourceInstantiationException
public void execute() throws ExecutionException
execute
in interface Executable
execute
in class AbstractProcessingResource
ExecutionException
protected void matchNameAnnotations() throws ExecutionException
ExecutionException
protected void matchUnknown() throws ExecutionException
ExecutionException
protected void matchWithPrevious(Annotation nameAnnot, String annotString)
protected boolean matchAnnotations(Annotation newAnnot, String annotString, Annotation prevAnnot)
protected boolean matchOtherAnnots(List toMatchList, Annotation newAnnot, String annotString)
protected boolean matchedAlready(Annotation annot1, Annotation annot2)
protected Annotation updateMatches(Annotation newAnnot, String annotString)
protected void updateMatches(Annotation newAnnot, Annotation prevAnnot)
protected void docCleanup()
protected String containTitle(String annotString, Annotation annot) throws ExecutionException
ExecutionException
protected String stripCDG(String annotString, Annotation annot)
protected void createAnnotList(String nameFile, String nameList) throws IOException
IOException
public void setExtLists(Boolean newExtLists)
public void setCaseSensitive(Boolean newCase)
public void setAnnotationSetName(String newAnnotationSetName)
public void setAnnotationTypes(List newType)
public void setProcessUnknown(Boolean processOrNot)
public void setOrganizationType(String newOrganizationType)
public void setPersonType(String newPersonType)
public String getAnnotationSetName()
public List getAnnotationTypes()
public String getOrganizationType()
public String getPersonType()
public Boolean getExtLists()
public Boolean getCaseSensitive()
public Boolean getProcessUnknown()
protected boolean isUnknownGender(String gender)
public boolean matchRule0(String s1, String s2)
public boolean matchRule1(String s1, String s2, boolean matchCase)
public boolean matchRule2(String s1, String s2)
public boolean matchRule3(String s1, String s2)
public boolean matchRule4(String s1, String s2)
public boolean matchRule5(String s1, String s2)
public boolean matchRule6(String s1, String s2)
public boolean matchRule7(String s1, String s2)
public boolean matchRule8(String s1, String s2)
public boolean matchRule9(String s1, String s2)
public boolean matchRule10(String s1, String s2)
public boolean matchRule11(String s1, String s2)
public boolean matchRule12(String s1, String s2)
public boolean matchRule13(String s1, String s2)
public boolean matchRule14(String s1, String s2)
public boolean matchRule15(String s1, String s2)
public void setDefinitionFileURL(URL definitionFileURL)
public URL getDefinitionFileURL()
public void setEncoding(String encoding)
public String getEncoding()
|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |