Aug. 30th, 2012

jducoeur: (Default)
So the Great OP Project continues to trundle along, and is going fairly well: I'm now parsing about ten years' of court reports, and all of "A" from the alphabetical list. I've added concepts of where awards come from, and lots more.

Today's project was The Name Synonym system. I've gotten tired of staring at all the duplications and mis-aligned entries in the data due to misspellings, typos and whatnot -- a fair number of cross-references are given in the original files, but a *vast* number aren't.

This project is driving home just how much raw *gruntwork* is involved in cleaning up this mess. I've just spent the past three hours going through the A's in detail, and finding at least most of the obvious errors. It's instructive in showing how big the job is, so I've included the current state of the synonym file below the cut.Because, really, it goes on for pages )

Profile

jducoeur: (Default)
jducoeur

May 2025

S M T W T F S
    123
45678910
11121314 151617
18192021222324
25262728293031

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags