February 02, 2005

two of each

[He] held one finger up directly in from of Yossarian and demanded, "How many fingers do you see?"
"Two", said Yossarian.
"How many fingers do you see now?" asked the doctor, holding up two.
"Two", said Yossarian.
"And how many now?" asked the doctor, holding up none.
"Two", said Yossarian.
The doctor's face wreathed with a smile. "By Jove, he's right," he declared jubilantly. "He does see everything twice."

In Catch-22, by Joseph Heller.



For some mysterious reason, Thunderbird occasionally downloads my emails twice from the POP server. I think it may be related with the fact that I use to email clients (home and office) to suck emails from the server. But I also remember seeing this happen in Outlook in a previous life. So maybe it the server's fault rather than Thunderbird's.

Aaanyway, I decided to create an extension for Thunderbird that would delete duplicate emails automatically. Something that could be triggered from the context menu of a folder or account (and why not, as soon as an email arrives).

So the approach goes like this. The program iterates over every email in a folder or account and computes a hash for each email, using some function such as MD5. It then stores the hash in a map, associating it with the email (or rather, a reference to the email. We don't want to store the contents of every email in memory). But, if the map already contains an entry for that hash, it means we're probably facing a duplicate email. So the program compares the contents of both emails to rule out any chance of them having the same hash by coincidence (very unlikely, but possible).

So far so good. After a few cycles of writing and testing the Javascript code (writing mozilla extensions is messy, I tell you) I got to a point where it could detect duplicate emails (no deletion yet).

To my surprise, when I ran it the browser locked up for a few seconds. I eventually discovered what the problem was and I'll get to it after I describe how to program is structured.

There's a main function, deleteDuplicateMessages() that is invoked when the menu item is clicked on. This function iterates over the set of message headers (Thunderbird objects that contain metadata about each email) and invokes an asynchronous method to stream the contents of the email. The function takes an callback object as an argument. I believe they had to implement it like that to support IMAP, where getting the contents of an email could take a while.

The callback object gathers all the pieces of the email as it is streamed, strips out the headers and computes the MD5 hash. It then makes sure the entry does not exist in the hash map and exits.

The reason for the lockup is two-fold. First, there's the issue with the MD5 library I'm using, which takes a few seconds to compute the hash for really long strings.

The second issue seems to be that Javascript code in Mozilla executes in the UI thread, regardless of whether it's a callback function being invoked by a native component. So what's happening here is that the computation of the MD5 hash executes in the UI thread and prevents the Thunderbird from responding to user events.

I've been trying to find how to make that callback run in a separate thread, without much luck.

Another alternative would be to write the callback object in C++ as an XPCOM component, but I wanted to make the extension as portable as possible. Creating a native XPCOM component would mean that I'd have to compile it for every platform I wish my extension to support. Too bad.

The quest goes on.

4 Comments:

Anonymous Anonymous said...

Or just keep a hash of the Message-ID field, which is unique for each message. In your case, especially, where you are downloading the same exact message multiple times, this should be enough. #

It should also work even in cases where you're on the recipient list multiple times, like say a reply to a mailing list message, where you were cc'd personally on the reply. The advantage of using the Message-Id field is that it will be the same for those two messages, even though headers ("Received", etc) may be different (and thus your MD5's would be different).

should also be alot faster!

February 03, 2005 6:56 AM  
Anonymous Anonymous said...

I tried this on Netscape 7.2 and it works good.

sources of duplicate emails:
1) I have two email addresses: my company email address and a contractor email address for a different company. That company has software that is used for bug/question submission and unfortunately I get double emails everytime because when it collects info for the creation of the form it also collects all email addresses, adn then send that form to all email address.

2) I belong to several user group email aliases. When a person does a reply all to me many times I get an additional email copy since I am already part of the email alias.


I hate outlook

I could not find a program that could delete duplicate email messages in Netscape/Mozilla..until someone gave me your link

I was able to delete over 25,000 duplicate emails. The largest folder had 32000+ emails and it found just under 16,000 duplicate emails.

thank you!

May 31, 2005 9:58 PM  
Anonymous Anonymous said...

Find and download what you need at Rapidshare Search Engine.
Top Site List Free Proxy Site Internet Marketing Tools Internet Marketing Auto Insurance Quotes Home Mortgage Loan Newest Gadgets Review Free Download mp3

October 31, 2009 9:23 PM  
Anonymous Anonymous said...

Nice work and thanks!
Running
Adidas currently manufactures several running shoesNike shoes, including the adiStar Control 5, the adiStar Ride
Cheap nike shoes
Discount nike shoes
the Supernova Sequence and the Supernova Cushion 7, among others.
Nike shox r4
nike shox torch
nike shox shoes
Adidas also uses kangaroo leather to make their more expensive shoes.
Association football
One of the main focuses of Adidas is football kit and associated equipment.
puma cat
cheap sport shoes
Adidas also provides apparel and equipment for all teams in Major League Soccer. Adidas remain a major company in the supply of team kits for international football teams.
cheap nike shox
cheap nike max
Adidas also makes referee kits that are used in international competition and by many countries and leagues in the world. In the United States, referees wear the Adidas kits in MLS matches even though the primary referee supplier is Official Sports.
nike tn dollar
nike running shoes
The company has been an innovator in the area of footwear for the sport with notable examples including development of the Copa Mondial moulded boot on firm dry pitches for forty years.
nike air max tn
puma shoes
Adidas became renowned for advancing the "Predator" boot design.This design featured a ribbed rubber structure for the upper leather of the shoe, used to accent the movement of the ball.
discount puma shoes
puma mens shoes
The Predator also features the Craig Johnston invented "Traxion" sole. As the development and popularity of Football continued Adidas played a leading role in shaping the style of the play itself.
puma running shoes
puma shoes
FIFA, the sports governing body, commissioned specially designed footballs for use in its own World Cup tournaments to favour more attacking play.
ghd hair straighteners mk4
hair straightners
ghd iv styler hair straightener
ghd hair straightners
cheap ghd hair straighteners

December 09, 2009 5:50 PM  

Post a Comment

<< Home