public class WikipediaCleaner extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
WikipediaCleaner.CleanerOption |
| Constructor and Description |
|---|
WikipediaCleaner(String outputFile,
Set<WikipediaCleaner.CleanerOption> options,
int minTokensPerArticle)
Create a new
WikipediaCleaner which will read articles from
outputFileName, with the given thresholds for link requirements. |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(String[] args) |
void |
processDocument(edu.ucla.sspace.tools.WikipediaCleaner.WikiDoc doc)
Process the content of the given
WikiDoc. |
void |
removeExternalLinkMarkup(StringBuilder article)
Replace [link] tags with link name and track what articles this article
links to.
|
void |
removeWikiLinkMarkup(StringBuilder article,
String title)
Replace [[link]] tags with link name and track what articles this article
links to.
|
public WikipediaCleaner(String outputFile, Set<WikipediaCleaner.CleanerOption> options, int minTokensPerArticle)
WikipediaCleaner which will read articles from
outputFileName, with the given thresholds for link requirements.public void processDocument(edu.ucla.sspace.tools.WikipediaCleaner.WikiDoc doc)
WikiDoc.doc - The WikiDoc to process.public void removeWikiLinkMarkup(StringBuilder article, String title)
text - The article text to clean and process link structure of.public void removeExternalLinkMarkup(StringBuilder article)
text - The article text to clean and process link structure of.public static void main(String[] args)
Copyright © 2012. All Rights Reserved.