gov.llnl.ontology.wordnet
Class WordNetCorpusReader

java.lang.Object
  extended by gov.llnl.ontology.wordnet.WordNetCorpusReader
All Implemented Interfaces:
OntologyReader

public class WordNetCorpusReader
extends Object
implements OntologyReader

This class acts as the central interface for the WordNet dictionary. It begins it's initialization by reading all of the dictionary information into ram and then generates a complete graph, connecting all Synsets via the specified relations. The dictionary graph can be modified during runtime and later saved to disk in the same format as the original WordNet database, allowing other interfaces the ability to access the modified form of WordNet.

This reader is heavily based on the NLTK WordNet corpus reader.

Author:
Keith Stevens

Field Summary
static String[] FILE_EXTENSIONS
          The file extensions for each of the data and index files in the WordNet dictionary.
static Map<String,Synset.PartsOfSpeech> POS_MAP
          A simple mapping from part of speech characters their respective ParstOfSpeech enumerations.
static String[] POS_TAGS
          The set of part of speech tags.
 
Method Summary
 void addSynset(Synset synset)
          Adds synset to the OntologyReader.
 void addSynset(Synset synset, int index)
          Adds synset to the OntologyReader.
 Set<Synset> allSynsets()
          Returns a Set of all Synsets maintained by this OntologyReader.
 Set<Synset> allSynsets(Synset.PartsOfSpeech pos)
          Returns a Set of all Synsets for the given Synset.PartsOfSpeech maintained by this OntologyReader.
 int getMaxDepth(Synset.PartsOfSpeech pos)
          Returns the maximum depth of any Synset chain in this OntologyReader.
 Synset getSynset(String fullSynsetName)
          Returns the Synset specified by the full synset name.
 Synset getSynset(String lemma, Synset.PartsOfSpeech pos, int senseNum)
          Returns the single Synset specified by the given lemma name, part of speech tag, and sense number.
 Synset[] getSynsets(String lemma)
          Returns all Synsets that match the given lemma name.
 Synset[] getSynsets(String lemma, Synset.PartsOfSpeech pos)
          Returns all Synsets that match the given lemma name and part of speech.
 Synset[] getSynsets(String lemma, Synset.PartsOfSpeech pos, boolean useMorphy)
          Returns all Synsets that match the given lemma name and part of speech.
static WordNetCorpusReader getWordNet()
          Returns the initialzied instance of the WordNetCorpusReader.
static WordNetCorpusReader initialize(String dictPath)
          Returns a singleton instance of the WordNetCorpusReader.
static WordNetCorpusReader initialize(String dictPath, boolean readFromJar)
          Returns a singleton instance of the WordNetCorpusReader.
 Iterator<String> morphy(String form)
          Returns an Iterator over the possible morphological variations of the given word form for all Synset.PartsOfSpeech.
 Iterator<String> morphy(String form, Synset.PartsOfSpeech pos)
          Returns an Iterator over the possible morphological variations of the given word form for a given Synset.PartsOfSpeech.
 void removeSynset(Synset synset)
          Removes synset from the OntologyReader.
 void replaceSynset(Synset synset, Synset replacement)
          Removes the Synset from the known hierarchy.
 Set<String> wordnetTerms()
          Returns a Set of lemmas that serve as keys in this OntologyReader.
 Set<String> wordnetTerms(Synset.PartsOfSpeech pos)
          Returns a Set of lemmas that the current word net instance is aware of for a particular Synset.PartsOfSpeech.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

POS_TAGS

public static final String[] POS_TAGS
The set of part of speech tags.


POS_MAP

public static final Map<String,Synset.PartsOfSpeech> POS_MAP
A simple mapping from part of speech characters their respective ParstOfSpeech enumerations.


FILE_EXTENSIONS

public static final String[] FILE_EXTENSIONS
The file extensions for each of the data and index files in the WordNet dictionary.

Method Detail

morphy

public Iterator<String> morphy(String form)
Returns an Iterator over the possible morphological variations of the given word form for all Synset.PartsOfSpeech. For each part of speech, if there are any known exceptions for the form, they will be returned before the part of speech specific replacement rules. For example, if "geese" is given, "goose" will be returned first. Afterwords, no other variations would be returned. If "explodes" is given, the variants would be "explode", "explode", and "explod", based on the rules specified in MORPHOLOGICAL_SUBSTITUTIONS.

Specified by:
morphy in interface OntologyReader

morphy

public Iterator<String> morphy(String form,
                               Synset.PartsOfSpeech pos)
Returns an Iterator over the possible morphological variations of the given word form for a given Synset.PartsOfSpeech. If there are any known exceptions for the form, they will be returned before the part of speech specific replacement rules. For example, if "geese" is given, "goose" will be returned first. Afterwords, no other variations would be returned. If "explodes" is given, the variants would be "explode", "explode", and "explod", based on the rules specified in MORPHOLOGICAL_SUBSTITUTIONS.

Specified by:
morphy in interface OntologyReader

allSynsets

public Set<Synset> allSynsets()
Returns a Set of all Synsets maintained by this OntologyReader.

Specified by:
allSynsets in interface OntologyReader

allSynsets

public Set<Synset> allSynsets(Synset.PartsOfSpeech pos)
Returns a Set of all Synsets for the given Synset.PartsOfSpeech maintained by this OntologyReader.

Specified by:
allSynsets in interface OntologyReader

addSynset

public void addSynset(Synset synset)
Adds synset to the OntologyReader. A mapping from each Lemma linked to by synset will be made to synset. synset will be set as the last Synset for each Lemma mapping.

Specified by:
addSynset in interface OntologyReader

addSynset

public void addSynset(Synset synset,
                      int index)
Adds synset to the OntologyReader. A mapping from each Lemma linked to by synset will be made to synset. synset will be set at index index for each Lemma mapping, or as the last entry if index is too large for any particular Lemma mapping.

Specified by:
addSynset in interface OntologyReader

removeSynset

public void removeSynset(Synset synset)
Removes synset from the OntologyReader. A mapping from each Lemma linked to by synset will be removed from synset.

Specified by:
removeSynset in interface OntologyReader

replaceSynset

public void replaceSynset(Synset synset,
                          Synset replacement)
Removes the Synset from the known hierarchy. All mappings from lemmas to this Synset will be removed, along with any stored details about this particular Synset.

Specified by:
replaceSynset in interface OntologyReader

wordnetTerms

public Set<String> wordnetTerms()
Returns a Set of lemmas that serve as keys in this OntologyReader.

Specified by:
wordnetTerms in interface OntologyReader

wordnetTerms

public Set<String> wordnetTerms(Synset.PartsOfSpeech pos)
Returns a Set of lemmas that the current word net instance is aware of for a particular Synset.PartsOfSpeech.

Specified by:
wordnetTerms in interface OntologyReader

initialize

public static WordNetCorpusReader initialize(String dictPath)
Returns a singleton instance of the WordNetCorpusReader. If the reader has not already been created, it will be initialzied. This method assumes that dictPath does not correspond to a jar internal path.


initialize

public static WordNetCorpusReader initialize(String dictPath,
                                             boolean readFromJar)
Returns a singleton instance of the WordNetCorpusReader. If the reader has not already been created, it will be initialized. If readFromjar is true, the reader will dictPath as a path within the current jar running this code and read the dictionary files from the jar. In these cases, dictPath should start with "/". A common argument for dictPath is "/dict", which assumes that the directory dict contains all the WordNet dictionary files and is as the base directory of the jar.


getWordNet

public static WordNetCorpusReader getWordNet()
Returns the initialzied instance of the WordNetCorpusReader.


getSynsets

public Synset[] getSynsets(String lemma)
Returns all Synsets that match the given lemma name.

Specified by:
getSynsets in interface OntologyReader

getSynsets

public Synset[] getSynsets(String lemma,
                           Synset.PartsOfSpeech pos)
Returns all Synsets that match the given lemma name and part of speech. If there is no known mapping for the given word, the Synsets for all it's part of speech specific morphological variations will be returned.

Specified by:
getSynsets in interface OntologyReader

getSynsets

public Synset[] getSynsets(String lemma,
                           Synset.PartsOfSpeech pos,
                           boolean useMorphy)
Returns all Synsets that match the given lemma name and part of speech. If there is no known mapping for the given word and useMorphy is true, the Synsets for all it's part of speech specific morphological variations will be returned.

Specified by:
getSynsets in interface OntologyReader

getSynset

public Synset getSynset(String fullSynsetName)
Returns the Synset specified by the full synset name. The name should be of the following format: lemma.pos.senseNum

Specified by:
getSynset in interface OntologyReader

getSynset

public Synset getSynset(String lemma,
                        Synset.PartsOfSpeech pos,
                        int senseNum)
Returns the single Synset specified by the given lemma name, part of speech tag, and sense number. Sense numbers start at 1.

Specified by:
getSynset in interface OntologyReader

getMaxDepth

public int getMaxDepth(Synset.PartsOfSpeech pos)
Returns the maximum depth of any Synset chain in this OntologyReader.

Specified by:
getMaxDepth in interface OntologyReader


Copyright © 2010-2011. All Rights Reserved.