Odds are that I could do some sort of filtering based on the source IPs (I do capture the domain name or IP address when possible), but it wouldn't be as accurate as doing it by hand. And I suspect that it's going to be faster to edit the file by hand than to build an automated solution -- both the spam and the real entries are remarkably distinctive in look and feel, with the result that it is proving a surprisingly quick between-edits job...
no subject