public class PartitioningNearestNeighborFinder extends Object implements NearestNeighborFinder, Serializable
NearestNeighborFinder
operates by generating a set of principle
vectors that reflect average words in a SemanticSpace
and then
mapping each principle vector to the set of words to which it is closest.
Finding the nearest neighbor then entails finding the k-closest
principle vectors and comparing only their words, rather than all the words
in the space. This dramatically reduces the search space by partitioning the
vectors of the SemanticSpace
into smaller sets, not all of which need
to be searched.
The number of principle vectors is typically far less than the total
number of vectors in the SemanticSpace
, but should be more than the
expected number of neighbors being searched for. This value can be optimized
by minimizing the value of c
in the equation c = k * p + (k *
(|Sspace| / p))
, where p
is the number of principle components,
k
is the number of nearest neighbors to be found, and |Sspace|
is the size of the semantic space.
Instances of this class are also serializable. If the backing SemanticSpace
is also serializable, the space will be saved. However, if
the space is not serializable, its contents will be converted to a static
version and saved as a copy.
Constructor and Description |
---|
PartitioningNearestNeighborFinder(SemanticSpace sspace)
Creates a new
NearestNeighborFinder for the SemanticSpace , using loge(|words|) principle vectors to
efficiently search for neighbors. |
PartitioningNearestNeighborFinder(SemanticSpace sspace,
int numPrincipleVectors)
Creates a new
NearestNeighborFinder for the SemanticSpace , using the specified number of principle vectors to
efficiently search for neighbors. |
Modifier and Type | Method and Description |
---|---|
SortedMultiMap<Double,String> |
getMostSimilar(Set<String> terms,
int numberOfSimilarWords)
Finds the k most similar words in the semantic space according to
the cosine similarity, returning a mapping from their similarity to the
word itself.
|
SortedMultiMap<Double,String> |
getMostSimilar(String word,
int numberOfSimilarWords)
Finds the k most similar words in the semantic space according to
the cosine similarity, returning a mapping from their similarity to the
word itself.
|
SortedMultiMap<Double,String> |
getMostSimilar(Vector v,
int numberOfSimilarWords)
Finds the k most similar words in the semantic space according to
the cosine similarity, returning a mapping from their similarity to the
word itself.
|
public PartitioningNearestNeighborFinder(SemanticSpace sspace)
NearestNeighborFinder
for the SemanticSpace
, using loge(|words|) principle vectors to
efficiently search for neighbors.sspace
- a semantic space to searchpublic PartitioningNearestNeighborFinder(SemanticSpace sspace, int numPrincipleVectors)
NearestNeighborFinder
for the SemanticSpace
, using the specified number of principle vectors to
efficiently search for neighbors.sspace
- a semantic space to searchnumPrincipleVectors
- the number of principle vectors to use in
representing the content of the space.public SortedMultiMap<Double,String> getMostSimilar(String word, int numberOfSimilarWords)
getMostSimilar
in interface NearestNeighborFinder
null
if the provided word was
not in the semantic space.public SortedMultiMap<Double,String> getMostSimilar(Set<String> terms, int numberOfSimilarWords)
getMostSimilar
in interface NearestNeighborFinder
null
if none of the provided
word were not in the semantic space.public SortedMultiMap<Double,String> getMostSimilar(Vector v, int numberOfSimilarWords)
getMostSimilar
in interface NearestNeighborFinder
Copyright © 2012. All Rights Reserved.