gate.creole.tokeniser.chinesetokeniser
Class Segmenter
java.lang.Object
gate.creole.tokeniser.chinesetokeniser.Segmenter
public class Segmenter
- extends Object
Title: Segmenter.java
Description: This class segments the Chinese Text by adding extra spaces
Company: University Of Sheffield
- Author:
- Erik E. Peterson - modified by Niraj Aswani
- See Also:
- source
Field Summary |
static int |
BOTH
|
static int |
SIMP
|
static int |
TRAD
|
Constructor Summary |
Segmenter(int charform,
boolean loadwordfile)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TRAD
public static final int TRAD
- See Also:
- Constant Field Values
SIMP
public static final int SIMP
- See Also:
- Constant Field Values
BOTH
public static final int BOTH
- See Also:
- Constant Field Values
Segmenter
public Segmenter(int charform,
boolean loadwordfile)
isNumber
public boolean isNumber(String testword)
isAllForeign
public boolean isAllForeign(String testword)
isNotCJK
public boolean isNotCJK(String testword)
stemWord
public String stemWord(String word)
segmentLine
public String segmentLine(String cline,
String separator)
addword
public void addword(String newword)
getMarks
public ArrayList getMarks()
- This method returns the marks where the spaces were added by the segmenter
segmentData
public String segmentData(String fileContents,
String encoding)