|
GATE Version 3.1-2270 |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecthepple.postag.POSTagger
public class POSTagger
A Java POS Tagger Author: Mark Hepple (hepple@dcs.shef.ac.uk) Input: An ascii text file in "Brill input format", i.e. one sentence per line, tokens separated by spaces. Output: Same text with each token tagged, i.e. "token" -> "token/tag". Output is just streamed to std-output, so commonly will direct into some target file. Revision: 13/9/00. Version 1.0. Comments: Implements a version of the decision list based tagging method described in: M. Hepple. 2000. Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part-of-Speech Taggers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000). Hong Kong, October 2000. Modified by Niraj Aswani/Ian Roberts to allow explicit specification of the character encoding to use when reading rules and lexicon files. $Id: POSTagger.java,v 1.2 2005/10/18 10:01:26 ian_roberts Exp $
| Field Summary | |
|---|---|
String[][] |
lexBuff
|
protected Map |
rules
|
String[] |
tagBuff
|
String[] |
wordBuff
|
| Constructor Summary | |
|---|---|
POSTagger(URL lexiconURL,
URL rulesURL)
Construct a POS tagger using the platform's native encoding to read the lexicon and rules files. |
|
POSTagger(URL lexiconURL,
URL rulesURL,
String encoding)
Construct a POS tagger using the specified encoding to read the lexicon and rules files. |
|
| Method Summary | |
|---|---|
Rule |
createNewRule(String ruleId)
Creates a new rule of the required type according to the provided ID. |
static void |
main(String[] args)
Main method. |
protected boolean |
oneStep(String word,
List taggedSentence)
Adds a new word to the window of 7 words (on the last position) and tags the word currently in the middle (i.e. on position 3). |
void |
readRules(URL rulesURL)
Reads the rules from the rules input file |
List |
runTagger(List sentences)
Runs the tagger over a set of sentences. |
void |
setEncoding(String encoding)
Deprecated. The rules and lexicon are read at construction time, so setting the encoding later will have no effect. |
void |
showRules()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected Map rules
public String[] wordBuff
public String[] tagBuff
public String[][] lexBuff
| Constructor Detail |
|---|
public POSTagger(URL lexiconURL,
URL rulesURL)
throws InvalidRuleException,
IOException
InvalidRuleException
IOException
public POSTagger(URL lexiconURL,
URL rulesURL,
String encoding)
throws InvalidRuleException,
IOException
InvalidRuleException
IOException| Method Detail |
|---|
public Rule createNewRule(String ruleId)
throws InvalidRuleException
ruleId - the ID for the rule to be created
InvalidRuleExceptionpublic List runTagger(List sentences)
sentences - a List of Lists
of words to be tagged. Each list is a sentence represented as a list of
words.
List of Lists of
String[]. A list of tagged sentences, each sentence
being itself a list having pairs of strings as elements with
the word on the first position and the tag on the second.public void setEncoding(String encoding)
protected boolean oneStep(String word,
List taggedSentence)
word - the new wordtaggedSentence - a List of pairs of strings representing the results
of tagging the current sentence so far.
public void readRules(URL rulesURL)
throws IOException,
InvalidRuleException
IOException
InvalidRuleExceptionpublic void showRules()
public static void main(String[] args)
|
GATE Version 3.1-2270 |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||