(no subject)

Date: 2012-08-24 02:48 pm (UTC)
jducoeur: (Default)
From: [personal profile] jducoeur
No clue, although it seems like a bad idea. (You can, of course, access groups by index as well -- the names are mainly a convenience, far as I can tell.)

As for the sed script -- possibly, but not nearly as easy as it sounds. Mind, most of the code I've written so far *is* normalizing the data. That a complex process, because there are so many irregularities at so many levels, ranging from badly-formed XHTML tags all the way up to the numerous ways that various award names get written. So I do a little pre-processing (TagSoup to turn the messy HTML into at least *legal* XHTML), but much of the normalization needs a lot of semantics so that the various syntactic structures can be handled appropriately in different places.

(For example, parens in the name at the top of an alpha listing are very semantically different from ones in an award listed inside it. I haven't even talked about the complex code to deal with all the different ways in which cross-references are described, which involves *optional* parentheses...)
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

jducoeur: (Default)
jducoeur

July 2025

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27 28293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags