An informal tool which cleans the
20
NewsGroups corpus. This cleaner removes all of the meta data for each
newsgroup posting. Email characters, such as "<" are separated from normal
text. This cleaner is expected to be run with the "20_newsgroups" directory
that is provided in the standard tarball distributed by UCI. Output will be
written to a specified file where each line will contain all the contents of
a single posting.