jducoeur: (Default)
[personal profile] jducoeur
[This one's just for the programmers in the audience; the rest of you should skip ahead.]

Perhaps the nicest thing about the OP Compiler project is that it's giving me the chance to really get sharp on Scala -- to get a sense of how to program the language idiomatically, the way it's supposed to be used, instead of just being transliterated Java. Occasionally, I write a few lines and marvel at how right and tight they are. Here's an example, which illustrates several elements. (None of which will be surprising to experienced functional programmers, but this stuff is new to me.)

Let's deal with the following problem. The OP Alpha listing consists of tables of a persona name, followed by date/award pairs. The problem at hand is that the "award" part often contains an attribution in parentheses, which is essentially noise -- a comment from my POV. For example:
Queen's Honor of Distinction (Jana IV)
The bit in the parens is messing up parsing the award name, so I need to separate it out.

In many languages, that would take a fair number of lines, but in Scala it turns out to be essentially four (could be more concise, but this seems clearest):
val awardCommentRegex = new Regex("""^(.*?) \((.*)\)$""", "name", "comment")
val commentMatch = awardCommentRegex.findFirstMatchIn(awardName)
val comment = commentMatch map (_.group("comment"))
val parsedAwardName = (commentMatch map (_.group("name"))).getOrElse(awardName)
Breaking that down:
  • The input is "awardName" -- that's the field I'm trying to parse.

  • The Regex is a fairly conventional regular expression; if it matches, it breaks the discovered groups into "name" and "comment".

  • The assignment to commentMatch does the actual regular-expression matching. That returns Option[Match] -- that is to say, the result contains either "Some(m)", where m is the found match information, or "None". In general, idiomatic Scala uses Option frequently for cases like this, where a function might return a value and might not; it is much safer than returning nulls, and avoids the usual mess of inventing return codes.

  • The assignment to comment does a "map", which basically keeps the exterior structure of a collection but changes the interior. In this case, it is transforming the Option[Match] to an Option[String], by extracting the matched comment if there was one. Again, if nothing was matched, it returns None.

  • The assignment to parsedAwardName is similar, but this time I want to get a definite String out the end, not an Option[String]. So first I fetch an Option[String]. Then the getOrElse() method either fetches the guts of that Option -- the String itself -- or, if the value is None, returns the original awardName that I started with.
Mind, everything here is strongly-typed -- Scala insists on strong typing throughout, so everything is very safe and errors get caught early. (Indeed, despite being newish to the language, I'm making very few runtime errors.) It's almost as concise as possible due to Scala's type inference -- while I'm not *declaring* object types above, that's because they are redundant, and Scala will simply figure them out for me. (The Eclipse plugin shows the inferred types when you hover over a name, so you can quickly check your work.)

No deep message here -- I'm just enjoying the elegance of it. I've always thought that I would really like working in pure Scala, and so far that's proving correct. And bit by bit, I'm absorbing the functional-programming models, and coming to appreciate that they have evolved far enough to often be much more concise than the comparable imperitive code...

(no subject)

Date: 2012-08-24 02:17 pm (UTC)
From: [identity profile] marphod.livejournal.com
I guess my background is showing when I'd want to run a sed script to normalize the data before parsing it. Strip parens, add field demarcation between the Title and Comment, or after the Title on lines without a comment, etc. =)

(Hunh. I wonder what happens if you give different groups in the Regex the same string title? Does the Regex constructor throw, does the parsing fail if more than one of those fields is non-None, or does some data get hidden?)

Profile

jducoeur: (Default)
jducoeur

June 2025

S M T W T F S
12 34567
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags