More fun with OP synonyms
Mar. 21st, 2013 10:19 amI'm getting to the point of diminishing returns, so it's getting to be time for me to give up on trying to polish the data; please forgive the duplications that make their way into the final online Order of Precedence, which will have to be merged by hand after it goes live. I've eliminated many thousands of duplicate records, but I'd be surprised if there are less than a few thousand that make it in. (There are still about 9000 incomplete records -- more than the 6000 I was targeting, but I think we'll have to live.)
But the system continues to be disconcertingly smart. Today, it complained to me that we had duplicate alpha entries for "Elizabeth Vynehorn" and "Muirne ni Cormaic", which led me on a merry chase: I couldn't figure out *why* it had decided that they were synonyms (I have begun to regret not building a system that records the reasoning, which gets pretty subtle and obscure from run to run), but fortunately found her LJ -- I hadn't realized that Muirne had changed her name. So I've updated my copy of the old OP accordingly.
Oh, in case anybody is interested -- one artifact of this project will be my final master copy of the old HTML files. These are massively cleaned-up HTML, and have many errors and duplications of this sort fixed. Folks are welcome and encouraged to refer back to these files after the new system goes live, since they are the data that the new OP will be bootstrapped from. The "alpha", "awards" and "chrono" directories roughly correspond to the files on op.eastkingdom.org, but with a great deal of massaging.
And the record for longest "alternate names" field goes (no surprise) to Mistress Nataliia Anastasiia Evgenova Sviatoslavina vnuchka, whose name is so long, and *never* spelled quite correctly in the Court Reports, that she winds up with 515 characters of alternate name field so far. (Far more than the 255 allowed -- I had to introduce some trimming code to keep her entry from breaking the database. I think she'll survive without every single misspelling recorded for posterity in her record.)
Anyway, continuing to plow through, and finish the current round of synonyms. When it is asking me whether Nathaniel Wyatt and Karrah the Mischevious are the same person, we're definitely running out of good guesses. (Yes, there was a reason -- they apparently were inducted into the White Oak the same day. Still, not exactly a high-quality guess...)
But the system continues to be disconcertingly smart. Today, it complained to me that we had duplicate alpha entries for "Elizabeth Vynehorn" and "Muirne ni Cormaic", which led me on a merry chase: I couldn't figure out *why* it had decided that they were synonyms (I have begun to regret not building a system that records the reasoning, which gets pretty subtle and obscure from run to run), but fortunately found her LJ -- I hadn't realized that Muirne had changed her name. So I've updated my copy of the old OP accordingly.
Oh, in case anybody is interested -- one artifact of this project will be my final master copy of the old HTML files. These are massively cleaned-up HTML, and have many errors and duplications of this sort fixed. Folks are welcome and encouraged to refer back to these files after the new system goes live, since they are the data that the new OP will be bootstrapped from. The "alpha", "awards" and "chrono" directories roughly correspond to the files on op.eastkingdom.org, but with a great deal of massaging.
And the record for longest "alternate names" field goes (no surprise) to Mistress Nataliia Anastasiia Evgenova Sviatoslavina vnuchka, whose name is so long, and *never* spelled quite correctly in the Court Reports, that she winds up with 515 characters of alternate name field so far. (Far more than the 255 allowed -- I had to introduce some trimming code to keep her entry from breaking the database. I think she'll survive without every single misspelling recorded for posterity in her record.)
Anyway, continuing to plow through, and finish the current round of synonyms. When it is asking me whether Nathaniel Wyatt and Karrah the Mischevious are the same person, we're definitely running out of good guesses. (Yes, there was a reason -- they apparently were inducted into the White Oak the same day. Still, not exactly a high-quality guess...)