|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecthepple.postag.POSTagger
public class POSTagger
A Java POS Tagger Author: Mark Hepple (hepple@dcs.shef.ac.uk) Input: An ascii text file in "Brill input format", i.e. one sentence per line, tokens separated by spaces. Output: Same text with each token tagged, i.e. "token" -> "token/tag". Output is just streamed to std-output, so commonly will direct into some target file. Revision: 13/9/00. Version 1.0. Comments: Implements a version of the decision list based tagging method described in: M. Hepple. 2000. Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part-of-Speech Taggers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000). Hong Kong, October 2000. Modified by Niraj Aswani/Ian Roberts to allow explicit specification of the character encoding to use when reading rules and lexicon files. $Id: POSTagger.java,v 1.2 2005/10/18 10:01:26 ian_roberts Exp $
Field Summary | |
---|---|
String[][] |
lexBuff
|
protected Map |
rules
|
String[] |
tagBuff
|
String[] |
wordBuff
|
Constructor Summary | |
---|---|
POSTagger(URL lexiconURL,
URL rulesURL)
Construct a POS tagger using the platform's native encoding to read the lexicon and rules files. |
|
POSTagger(URL lexiconURL,
URL rulesURL,
String encoding)
Construct a POS tagger using the specified encoding to read the lexicon and rules files. |
Method Summary | |
---|---|
Rule |
createNewRule(String ruleId)
Creates a new rule of the required type according to the provided ID. |
static void |
main(String[] args)
Main method. |
protected boolean |
oneStep(String word,
List taggedSentence)
Adds a new word to the window of 7 words (on the last position) and tags the word currently in the middle (i.e. on position 3). |
void |
readRules(URL rulesURL)
Reads the rules from the rules input file |
List |
runTagger(List sentences)
Runs the tagger over a set of sentences. |
void |
setEncoding(String encoding)
Deprecated. The rules and lexicon are read at construction time, so setting the encoding later will have no effect. |
void |
showRules()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected Map rules
public String[] wordBuff
public String[] tagBuff
public String[][] lexBuff
Constructor Detail |
---|
public POSTagger(URL lexiconURL, URL rulesURL) throws InvalidRuleException, IOException
InvalidRuleException
IOException
public POSTagger(URL lexiconURL, URL rulesURL, String encoding) throws InvalidRuleException, IOException
InvalidRuleException
IOException
Method Detail |
---|
public Rule createNewRule(String ruleId) throws InvalidRuleException
ruleId
- the ID for the rule to be created
InvalidRuleException
public List runTagger(List sentences)
sentences
- a List
of List
s
of words to be tagged. Each list is a sentence represented as a list of
words.
List
of List
s of
String
[]. A list of tagged sentences, each sentence
being itself a list having pairs of strings as elements with
the word on the first position and the tag on the second.public void setEncoding(String encoding)
protected boolean oneStep(String word, List taggedSentence)
word
- the new wordtaggedSentence
- a List of pairs of strings representing the results
of tagging the current sentence so far.
public void readRules(URL rulesURL) throws IOException, InvalidRuleException
IOException
InvalidRuleException
public void showRules()
public static void main(String[] args)
|
GATE Version 3.1-2270 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |