public class FixedDurationTemporalRandomIndexing extends OrderedTemporalRandomIndexing
TemporalSemanticSpace
class that optimizes a special case of TemporalRandomIndexing
where the documents are in sorted order and the
duration of a semantic partition is fixed. Specifically, the following three
properties are required: TimeSpan
.
In addition to the properties specified in OrderedTemporalRandomIndexing
, this class defines the following configurable
properties:
"edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing.partitionDuration"
TimeSpan
configuration string that will be used to determine the duration of all
semantic partitions generated by this instance.
This class does not support arbitrary multithreading of the processDocument
method. However, it
does support concurrent calls provided that all the documents are
within the semantic partitions. That is, multiple threads may be used to process
all of a semantic partitions documents, provided that the documents from the next
partition are not interleaved. If the document orderering and time span are
known ahead of time, multi-threading can be done with a CyclicBarrier
. The following is an
example of how to correctly multi-thread this class.
// Initialize the following variables according to program semantics int numThreads; TimeSpan partitionDuration; Iterator<TemporalDocument> documents; FixedDurationTemporalRandomIndexing fdTRI; // As threads finish processing a semantic partition, they add the value of the // next time stamp as a key in this map, which allows the processing thread // (see partitionHook below) to determine the start time of the next partition ConcurrentNavigableMap<Long,Object> futureStartTimes = new ConcurrentSkipList<Long,Object>(); // Create a custom Runnable that will handle processing the semantic space // after the partition has been finished. Runnable partitionHook = new Runnable() { // Process the semantic space as necessary here... // Once processing has finished, notify the threads of the next // time stamp that will be processed. In the unlikely event that // the number of documents in a partition would be less than the number of // threads, this ensures that thread processing the partition after the next // correctly waits. Long ssStart = futureStartTimes.firstKey(); futureStartTimes.clear(); // reset for next partition // last update the date with the new time curSSpaceStartTime.set(ssStart); } // Create the barrier that the threads will use to synchronize their // processDocument() calls. Note that we use the partition hook here // instead of attaching it via the addPartitionHook() method final CyclicBarrier exceededTimeSpanBarrier = new CyclicBarrier(numThreads, partitionHook); // A required barrier for the initial case of setting the start time for the // first partition final AtomicBoolean startBarrier = new AtomicBoolean(false); // The starting time for the current semantic partition. This value is used to // determine if processing the next document would cause the current partition // to be partitioned and a new partition created. final AtomicLong startTimeOfCurrentPartition = new AtomicLong(); // Before a Thread blocks waiting for partition processing, it enqueues the // time for its next document (exceeding the duration). These times are used // to select the start time for the next partition. final QueuefutureStartTimes = new ConcurrentLinkedQueue (); // A counter for which document is being processed final AtomicInteger docCounter = new AtomicInteger(0); // Start all the threads for (int i = 0; i < numThreads; ++i) { Thread processingThread = new Thread() { public void run() { // repeatedly try to process any remaining documents while (documents.hasNext()) { TemporalDocument doc = docuemnts.next(); long docTime = doc.timeStamp(); int docNumber = docCounter.incrementAndGet(); // special case for first document if (docNumber == 1) { startTimeOfCurrentPartition.set(docTime); startBarrier.set(true); } // Spin until the Thread with the first document sets the // initial starting document time. Note that we spin here // instead of block, because this is expected that another // thread will immediately set this and so it will be a // quick no-op while (startBarrier.get() == false) ; // Check whether the time for this document would exceed the // maximum duration of the current partition. Loop to ensure // that if this thread does loop and another thread has an // earlier time that exceeds the time period, then this // thread will block until the earlier partition has finished // processing while (!timeSpan.insideRange(startTimeOfCurrentPartition.get(), docTime)) { try { // notify the barrier that this Thread is now // processing a document in the next time span. In // addition, enqueue the time for this document so // the serialization thread can reset the correct // s-sspace start time futureStartTimes.add(docTime, new Object()); exceededTimeSpanBarrier.await(); } catch (Exception ex) { // Handle exception here; } } try { fdTRI.processDocument(doc.reader()); } catch (IOException ioe) { throw new IOError(ioe); // rethrow } } } }; // Start threads and wait for processing to finish...
Note that the requirements of an OrderedTemporalRandomIndexing
class
stipulate that the documents be processed in order. For this class, the
documents must be in order according to their semantic partition. In addition,
the first document seen for a semantic partition should be the earliest for that
partition. This behavior is most easily accomplished by sorting the documents
according to time stamp prior to processing the documents.
Modifier and Type | Field and Description |
---|---|
static TimeSpan |
DEFAULT_SEMANTIC_PARTITION_DURATION
The default time span of one month to be used if no time span is
specified using the "edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing.partitionDuration" property.
|
static String |
SEMANTIC_PARTITION_DURATION_PROPERTY
The property to set duration of a semantic partition using a
TimeSpan
configuration string. |
currentSlice, DEFAULT_VECTOR_LENGTH, DEFAULT_WINDOW_SIZE, endTime, partitionHooks, PERMUTATION_FUNCTION_PROPERTY, startTime, USE_PERMUTATIONS_PROPERTY, USE_SPARSE_SEMANTICS_PROPERTY, VECTOR_LENGTH_PROPERTY, WINDOW_SIZE_PROPERTY
Constructor and Description |
---|
FixedDurationTemporalRandomIndexing()
Creates an instance of
FixedDurationTemporalRandomIndexing using
the system properties to configure the behavior. |
FixedDurationTemporalRandomIndexing(Properties props)
Creates an instance of
FixedDurationTemporalRandomIndexing using
the provided properties to configure the behavior. |
Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
protected boolean |
shouldPartitionSpace(long timeStamp)
Returns
true if the time stamp for the next document would
exceed the duration of the current semantic partition. |
addPartitionHook, clear, endTime, getTimeSteps, getVector, getVectorAfter, getVectorBefore, getVectorBetween, getVectorLength, getWords, getWordToIndexVector, processDocument, processDocument, processSpace, setSemanticFilter, setWordToIndexVector, startTime
public static final TimeSpan DEFAULT_SEMANTIC_PARTITION_DURATION
public static final String SEMANTIC_PARTITION_DURATION_PROPERTY
TimeSpan
configuration string.public FixedDurationTemporalRandomIndexing()
FixedDurationTemporalRandomIndexing
using
the system properties to configure the behavior.IllegalStateException
- if the "edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing.partitionDuration" property is not setpublic FixedDurationTemporalRandomIndexing(Properties props)
FixedDurationTemporalRandomIndexing
using
the provided properties to configure the behavior.props
- the properties used to configure this instanceIllegalStateException
- if the "edu.ucla.sspace.tri.FixedDurationTemporalRandomIndexing.partitionDuration" property is not setpublic String getSpaceName()
getSpaceName
in interface SemanticSpace
getSpaceName
in class OrderedTemporalRandomIndexing
protected boolean shouldPartitionSpace(long timeStamp)
true
if the time stamp for the next document would
exceed the duration of the current semantic partition.shouldPartitionSpace
in class OrderedTemporalRandomIndexing
timeStamp
- the time stamp of the next document that has yet to
be processedtrue
if the time stamp for the next document would exceed
the duration of the current semantic partitionCopyright © 2012. All Rights Reserved.