Martin Heller
Contributing Writer

Battling Spam

analysis
Sep 9, 20074 mins

I get a lot of email that matters to me: often a hundred messages a day. I get even more spam sent to me: thousands of messages a day. I manage this successfully with multiple filters, multiple servers, and multiple email addresses. You might not be able to apply all of my techniques, but some of them might help you. Bayesian Filters at the Email Client I used to funnel all my email to a single PO

Bayesian Filters at the Email Client

I used to funnel all my email to a single POP3 client, and filter that on my computer. I quickly gave up on the spam filters built into Outlook and the other email clients I tried, but I found a free Bayesian filter that worked well for me: K9. Once I trained K9, it reached better than 99% accuracy, and I only needed to correct the classification of emails a few times a week.

Unfortunately, the sheer volume of spam kept increasing, and it took longer and longer to download the first batch of emails in the morning, especially on Monday. That led me to do more filtering at the server.

Server Filters

My email servers each have their own filters. At mheller.com, my ISP (Verio) runs SpamAssassin and ClamAV. I have SpamAssassin set to high sensitivity, and only rarely find false positives in the Junk and Quarantine folders on the server. At this point, I wish that I had the option to automatically delete the filtered emails; as it is, I have to log on to the server periodically and clear out the folders.

I do find that a small percentage of the spam makes it into my inbox folder. During the week, this is handled easily by my client filters. Over the weekend, however, it builds up, so I usually log onto the server on Sunday night and clear out the obvious spam from my inbox. It’s often a matter of flagging everything as spam, and then unchecking the dozen messages that look legitimate.

At pcpitstop.com, I have a rule set up on the server to delete the email that is flagged by IMail’s spam filters. I still have to clean out the obvious spam from my inbox on Sunday night to keep the Monday morning download manageable.

At infoworld.com, we use a subscription email filter from Postini. I get a daily summary of the filtered email, and every few days I log in and delete the questionable emails that have been retained. Once in a great while I find a legitimate email from a PR agency caught with the spam, and I usually release it and white-list the PR agency.

Multiple Email Addresses

Every Web site that asks me for a registration wants an email address. I have learned through bitter experience that many of them will send me unwanted information, and that some of them will even sell my email address. (I wanted to say “sell me down the river,” but it isn’t quite that bad.)

Since I control my domain, I give each of these sites a unique email address. As long as they use it responsibly, their mail will go through my normal filtering. If I notice a high percentage of spam to an address I’ve assigned to a site, I forward that address to devnull and never let them darken my doorstep again. One of the sites that have required this treatment is Oracle: the size of a company is no guarantee that they’ll honor your privacy rights.

GMail, Yahoo!, and Hotmail

A final technique is to take advantage of free email accounts. I have free accounts at GMail, Yahoo, and Hotmail, and I use them all. I take advantage of their spam filters to handle email accounts that have been public for long enough to attract large amounts of spam, but still get legitimate mail.

For example, the email address I used for my Byte column mail now redirects to GMail, where it is both flagged as Byte and filtered. Every day or two I get a legitimate email from a PR agency that wants coverage for a client but still has my Byte address.

Every minute or two, and sometimes even more often than that, I get a spam at my GMail account. Once in awhile a string of spams gets through GMail’s filters, and I have to flag them by hand, but overall GMail does a good job of filtering them automatically.

The Sad Statistics

How bad is it? The Spam folder of my GMail account, which automatically deletes messages after 30 days, currently holds 46 thousand messages. Add in the spam filtered at my other servers and at my POP3 client, and I’ve filtered out over 100 thousand spam messages a month.

Ouch.

Martin Heller

Martin Heller is a contributing writer at InfoWorld. Formerly a web and Windows programming consultant, he developed databases, software, and websites from his office in Andover, Massachusetts, from 1986 to 2010. From 2010 to August of 2012, Martin was vice president of technology and education at Alpha Software. From March 2013 to January 2014, he was chairman of Tubifi, maker of a cloud-based video editor, having previously served as CEO.

Martin is the author or co-author of nearly a dozen PC software packages and half a dozen Web applications. He is also the author of several books on Windows programming. As a consultant, Martin has worked with companies of all sizes to design, develop, improve, and/or debug Windows, web, and database applications, and has performed strategic business consulting for high-tech corporations ranging from tiny to Fortune 100 and from local to multinational.

Martin’s specialties include programming languages C++, Python, C#, JavaScript, and SQL, and databases PostgreSQL, MySQL, Microsoft SQL Server, Oracle Database, Google Cloud Spanner, CockroachDB, MongoDB, Cassandra, and Couchbase. He writes about software development, data management, analytics, AI, and machine learning, contributing technology analyses, explainers, how-to articles, and hands-on reviews of software development tools, data platforms, AI models, machine learning libraries, and much more.

More from this author