Complement set email filtering

Complement set email filtering

Complement Set Filtering (CSF) is a method for filtering unsolicited bulk email (UBE or spam) The technique utilizes at least two email accounts: the primary account where spam and non-spam is received and secondary accounts that receive only spam. CSF calculates the set theoretic difference between the primary and secondary email sets (email accounts) and identifies email messages contained in both sets.

Implementation

CSF is implemented by comparing message content in a UBE account (separate mailbox or alias) with the message content in a primary account. By definition, messages contained in the UBE account are spam so messages in the primary account that are substantially similar to messages in the UBE account are also spam. When the same message is found in both the primary account and the UBE account, it is deleted from the primary account.

The UBE account is established by creating a mailbox (or alias) incorporating a common first name (to help spammers guess the address) and the domain of the primary account, then exposing the UBE account to the internet. For example, if the primary mailbox is johnm@domain.com, the UBE account might be john@domain.com (see diagram below). After the UBE mailbox is set up, the email address is given to spammers by posting it to message boards, portal groups, “Who Is” listings, ecommerce sites and Usenet.

Complement Set Email Filtering

CSF works especially well in corporate environments where the domain is targeted by spammers and UBE tends to be very similar from mailbox to mailbox. Also, because CSF does not depend on characteristics of past UBE to identify current UBE it is particularly well suited for identifying UBE with new subject matter.

Advantages of CSF

Many spam-filtering techniques search for patterns and known spam subject matter in the headers and bodies of messages. Others use probabilities (Bayesian statistical methods, for example) to identify unwanted messages. CSF is effective as a stand alone filter or can be combined with other techniques.

CSF has at least three advantages over Bayesian and pattern analysis algorithms. First, CSF does not depend on content analysis other than what is required to find similarities between messages in the primary and UBE accounts. Second, CSF does not utilize scoring (word ranking) that can be circumvented with message obfuscating (V!agra instead of Viagra, for example). Third, CSF takes advantage of the fact most UBE contains identical message content, particularly messages targeted at specific corporate domains.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Xanga — Infobox Website name = Xanga favicon = caption = url = http://www.xanga.com/ commercial = Free/Subscription type = Blog registration = Required to blog owner = Xanga.com, Inc. author = Marc Ginsburg, Dan Huddle, John Hiler launch date = 1998… …   Wikipedia

  • Nokia — This article is about the telecommunications corporation. For the Finnish town, see Nokia, Finland. For other uses, see Nokia (disambiguation). Nokia Corporation Type Public company Traded as …   Wikipedia

  • Simple Mail Transfer Protocol — This article is about the Internet standard for electronic mail transmission. For the email delivery company, see SMTP (company). Internet protocol suite Application layer …   Wikipedia

  • No-IP — by Vitalwerks Internet Solutions, LLC. Logo Type LLC. Industry DNS and Hosting Founded 1999 Vitalwerks LLC Headquarters Reno, Nevada …   Wikipedia

  • Enterprise content management — (ECM) is a set of technologies used to capture, store, preserve and deliver content and documents and content related to organizational processes. ECM tools and strategies allow the management of an organization s unstructured information,… …   Wikipedia

  • Lotus Foundations — Infobox OS name = Lotus Foundations caption = Foundations Start WebConfig Interface developer = IBM family = Linux source model = Closed source working state = Current latest release version = 1.0.1 latest release date = Aug 25th, 2008 license =… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”