Jan 14 2009

Easy match-merge

Published by at 6:00 pm under Books,resources,Technology

My work as a tech steward and community leader involves dealing with a lot of little sets of data that comes from different sources.  As our communities live on more and more different platforms, for example, it becomes a messier and more complicated job to keep track of who’s on which platform, and we often need to put it together to get an overview.  In a formal environment all of the complexity would be handled by SQL queries or match-merge operations with tools like SAS (which I grew up on).  In an informal environment, we end up using use tools like spread-sheets (like Excel or Google’s) that allow us to do most of the work until we need to do a match merge.  That means we need to combine data from two sources, matching (joining records) where possible and interleaving where a match doesn’t occur.  Very basic, very boring and error-prone to do by hand.

Until now, using an idea from Phillipp Lenssen, Google Apps Hacks (Sebastopol, CA: OReilly, 2008) http://isbn.nu/9780596515881.  Here’s how you do it, following the idea on page 202.

Open a new google doc:

blankdoc

Insert the unique data (e.g., “the key”) from the one source (preferably sorted):

doc-one

The typical case is a list of email addresses.  Note that you’d only put the email addresses themselves, not all the other information that you have associated with the email address.

Save it.  Then overwrite it by “selecting all” and inserting the corresponding data from the other source (also sorted) and then save again.

doc-two

Now, under the “Tools” drop down menu, select “Revision history”,

doc-compare

Check the boxes and press “Compare versions”.  You get this very nice little listing:

results

The lines that are in one source but not the other are colored and you can easily tell which source they come from.  The lines that match (are in both sources) are black. Now you can go back to your Excel spread-sheet or wherever and do the rest of the process by hand.  It’s much easier to do because you have an easy-to-use listing showing where matches (and mis-matches) occur.

No responses yet

Leave a Reply