GATE
Version 3.1-2270

hepple.postag
Class POSTagger

java.lang.Object
  extended by hepple.postag.POSTagger

public class POSTagger
extends Object

A Java POS Tagger Author: Mark Hepple (hepple@dcs.shef.ac.uk) Input: An ascii text file in "Brill input format", i.e. one sentence per line, tokens separated by spaces. Output: Same text with each token tagged, i.e. "token" -> "token/tag". Output is just streamed to std-output, so commonly will direct into some target file. Revision: 13/9/00. Version 1.0. Comments: Implements a version of the decision list based tagging method described in: M. Hepple. 2000. Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part-of-Speech Taggers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000). Hong Kong, October 2000. Modified by Niraj Aswani/Ian Roberts to allow explicit specification of the character encoding to use when reading rules and lexicon files. $Id: POSTagger.java,v 1.2 2005/10/18 10:01:26 ian_roberts Exp $


Field Summary
 String[][] lexBuff
           
protected  Map rules
           
 String[] tagBuff
           
 String[] wordBuff
           
 
Constructor Summary
POSTagger(URL lexiconURL, URL rulesURL)
          Construct a POS tagger using the platform's native encoding to read the lexicon and rules files.
POSTagger(URL lexiconURL, URL rulesURL, String encoding)
          Construct a POS tagger using the specified encoding to read the lexicon and rules files.
 
Method Summary
 Rule createNewRule(String ruleId)
          Creates a new rule of the required type according to the provided ID.
static void main(String[] args)
          Main method.
protected  boolean oneStep(String word, List taggedSentence)
          Adds a new word to the window of 7 words (on the last position) and tags the word currently in the middle (i.e. on position 3).
 void readRules(URL rulesURL)
          Reads the rules from the rules input file
 List runTagger(List sentences)
          Runs the tagger over a set of sentences.
 void setEncoding(String encoding)
          Deprecated. The rules and lexicon are read at construction time, so setting the encoding later will have no effect.
 void showRules()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

rules

protected Map rules

wordBuff

public String[] wordBuff

tagBuff

public String[] tagBuff

lexBuff

public String[][] lexBuff
Constructor Detail

POSTagger

public POSTagger(URL lexiconURL,
                 URL rulesURL)
          throws InvalidRuleException,
                 IOException
Construct a POS tagger using the platform's native encoding to read the lexicon and rules files.

Throws:
InvalidRuleException
IOException

POSTagger

public POSTagger(URL lexiconURL,
                 URL rulesURL,
                 String encoding)
          throws InvalidRuleException,
                 IOException
Construct a POS tagger using the specified encoding to read the lexicon and rules files.

Throws:
InvalidRuleException
IOException
Method Detail

createNewRule

public Rule createNewRule(String ruleId)
                   throws InvalidRuleException
Creates a new rule of the required type according to the provided ID.

Parameters:
ruleId - the ID for the rule to be created
Throws:
InvalidRuleException

runTagger

public List runTagger(List sentences)
Runs the tagger over a set of sentences.

Parameters:
sentences - a List of Lists of words to be tagged. Each list is a sentence represented as a list of words.
Returns:
a List of Lists of String[]. A list of tagged sentences, each sentence being itself a list having pairs of strings as elements with the word on the first position and the tag on the second.

setEncoding

public void setEncoding(String encoding)
Deprecated. The rules and lexicon are read at construction time, so setting the encoding later will have no effect.

This method sets the encoding that POS tagger uses to read rules and the lexicons.


oneStep

protected boolean oneStep(String word,
                          List taggedSentence)
Adds a new word to the window of 7 words (on the last position) and tags the word currently in the middle (i.e. on position 3). This function also reads the word on the first position and adds its tag to the taggedSentence structure as this word would be lost at the next advance. If this word completes a sentence then it returns true otherwise it returns false.

Parameters:
word - the new word
taggedSentence - a List of pairs of strings representing the results of tagging the current sentence so far.
Returns:
returns true if a full sentence is now tagged, otherwise false.

readRules

public void readRules(URL rulesURL)
               throws IOException,
                      InvalidRuleException
Reads the rules from the rules input file

Throws:
IOException
InvalidRuleException

showRules

public void showRules()

main

public static void main(String[] args)
Main method. Runs the tagger using the arguments to find the resources to be used for initialisation and the input file.


GATE
Version 3.1-2270