Occasionally people review the submitted logs for obvious mistakes, but it is largely a trust system. They submit the output logs mass-check generates. They hand-classify their mail and then run mass-check over it. The corpus consists of many (approximately 1 million) pieces of real-world, hand sorted mail.Ī smallish number of people (about 15), including the developers themselves, work as volunteer "corpus submitters". so that SA thinks the ham messages are nearly all ham, and the spam messages are nearly all spam). We generate new scores by analyzing a massive collection of mail (a "corpus"), and running software to create a score-set that gets the best possible set of scores, so that the maximum possible number of mails in that corpus are correctly classified (ie. It takes quite a while and is labour-intensive, so we do it infrequently. This is the procedure we use to generate new scores.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |