gov.llnl.ontology.clustering
Class LocalitySensitiveSimilarityListGenerator

java.lang.Object
  extended by gov.llnl.ontology.clustering.LocalitySensitiveSimilarityListGenerator

public class LocalitySensitiveSimilarityListGenerator
extends Object

This class generates a list of similar terms for eaach word in a word space based on Locality Sensitive Hashing and hamming distances. This an implementation of the algorithm specified in the following paper:

  • Deepak Ravichandran, Patrick Pantel, and Eduard Hovy, "Randomized algorithms and NLP: using locality sensitive hash functions for high speed noun clustering," Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 43, pages 662-629, 2005, Available here
  • In short, this algorithm performs the following steps:
      Select d random basis vectors of length k, where k is the number of features in a word space. Covert each vector, v, in the word space into a d dimensional vector, with each dimension being 1 if cosine_sim(v, basis_vector(d_i)) >= 0, and 0 otherwise. Generate q permutation functions, which will shuffle the d dimensions in each reduced semantic vector. For each permutation function pi:

      Author:
      Keith Stevens

      Constructor Summary
      LocalitySensitiveSimilarityListGenerator()
                 
       
      Method Summary
      static Map<String,Set<String>> generateSimilarityLists(edu.ucla.sspace.common.SemanticSpace sspace, int numBasisVectors, int numPermutations, int numNeighbors, double threshold)
                Returns a mapping for each term in the word space to it's set of most similar neighbors, based on it's locality sensitive hash value.
       
      Methods inherited from class java.lang.Object
      clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
       

      Constructor Detail

      LocalitySensitiveSimilarityListGenerator

      public LocalitySensitiveSimilarityListGenerator()
      Method Detail

      generateSimilarityLists

      public static Map<String,Set<String>> generateSimilarityLists(edu.ucla.sspace.common.SemanticSpace sspace,
                                                                    int numBasisVectors,
                                                                    int numPermutations,
                                                                    int numNeighbors,
                                                                    double threshold)
      Returns a mapping for each term in the word space to it's set of most similar neighbors, based on it's locality sensitive hash value.

      Parameters:
      sspace - The SemanticSpace from which similarity values are extracted.
      -


      Copyright © 2010-2011. All Rights Reserved.