public class WikipediaCleaner extends Object
Modifier and Type | Class and Description |
---|---|
static class |
WikipediaCleaner.CleanerOption |
Constructor and Description |
---|
WikipediaCleaner(String outputFile,
Set<WikipediaCleaner.CleanerOption> options,
int minTokensPerArticle)
Create a new
WikipediaCleaner which will read articles from
outputFileName , with the given thresholds for link requirements. |
Modifier and Type | Method and Description |
---|---|
static void |
main(String[] args) |
void |
processDocument(edu.ucla.sspace.tools.WikipediaCleaner.WikiDoc doc)
Process the content of the given
WikiDoc . |
void |
removeExternalLinkMarkup(StringBuilder article)
Replace [link] tags with link name and track what articles this article
links to.
|
void |
removeWikiLinkMarkup(StringBuilder article,
String title)
Replace [[link]] tags with link name and track what articles this article
links to.
|
public WikipediaCleaner(String outputFile, Set<WikipediaCleaner.CleanerOption> options, int minTokensPerArticle)
WikipediaCleaner
which will read articles from
outputFileName
, with the given thresholds for link requirements.public void processDocument(edu.ucla.sspace.tools.WikipediaCleaner.WikiDoc doc)
WikiDoc
.doc
- The WikiDoc
to process.public void removeWikiLinkMarkup(StringBuilder article, String title)
text
- The article text to clean and process link structure of.public void removeExternalLinkMarkup(StringBuilder article)
text
- The article text to clean and process link structure of.public static void main(String[] args)
Copyright © 2012. All Rights Reserved.