Extensible enumerations
Apr. 6th, 2011 01:18 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
(This one's just for the programmers, but it's not as opaque a topic as usual.)
So in the course of thinking about my last post (about Querki), I happened to look at my desired feature list. And I was reminded of one of those bugbears of most programming languages: enumerations.
Here's the thing: what I want, much of the time, is the ability for subclasses to *extend* enumerations. The most common use case is return codes. I might have a base class that implements the concept of return codes, and includes an enumeration of the basic codes for success, warning, failure -- generic stuff. As I add subclasses with more precise functionality, I want those subclasses to be able to add more-precise return codes, that describe the results in more detail.
But I can't do that, at least not safely, in any language that I'm thinking of offhand. The thing is, I wind up with two possibilities. Either the return code is something like a raw integer under the hood, so it's totally unsafe and not really type-checked -- this allows for risky code. Or the enumeration *is* strongly typed -- in which case, I can't add values. Why not? Because adding values is effectively *extending* the type, rather than *specializing* it, and that's a no-no in any decent OO system. It implies that a subclass could add values that don't exist in the parent, so code written for the parent doesn't necessarily work with the subclass -- a violation of the basic principles of OO.
When I think about it, though, it occurs to me that it's easy to fix this: just redefine my terms. What I really want is the idea of *hierarchical* enumerations. That is, the base class defines "success" and "failure", but no details. But this is precisely what OO is good at: what I want is to think of those success and failure values as *classes*, which I can then subclass. So I want to be able to define a "subclass" of the enumeration that adds specializations to the existing values. So instead of tacking "Null pointer" or "Not authorized" to the end of my enumeration, I should be thinking of them as specialized values of "Failure". That follows good OO principles, and seems like it would work well.
Does anyone know any languages that formally do this? I *can* do it in Scala decently well using case classes, but that's not quite optimal: I'd prefer a syntactic construct that is more explicit about this notion of a hierarchical, extensible enumeration. But I don't think I've ever come across one.
Anyway, I need to chew on this further. It's directly relevant to Querki, because I want to add a formal concept of enumerations there, and it needs to be exceptionally rich in order to cope with the use cases. (Since Querki is going to be used for world-building, I need a lot of schema flexibility.) I may well apply this concept there...
So in the course of thinking about my last post (about Querki), I happened to look at my desired feature list. And I was reminded of one of those bugbears of most programming languages: enumerations.
Here's the thing: what I want, much of the time, is the ability for subclasses to *extend* enumerations. The most common use case is return codes. I might have a base class that implements the concept of return codes, and includes an enumeration of the basic codes for success, warning, failure -- generic stuff. As I add subclasses with more precise functionality, I want those subclasses to be able to add more-precise return codes, that describe the results in more detail.
But I can't do that, at least not safely, in any language that I'm thinking of offhand. The thing is, I wind up with two possibilities. Either the return code is something like a raw integer under the hood, so it's totally unsafe and not really type-checked -- this allows for risky code. Or the enumeration *is* strongly typed -- in which case, I can't add values. Why not? Because adding values is effectively *extending* the type, rather than *specializing* it, and that's a no-no in any decent OO system. It implies that a subclass could add values that don't exist in the parent, so code written for the parent doesn't necessarily work with the subclass -- a violation of the basic principles of OO.
When I think about it, though, it occurs to me that it's easy to fix this: just redefine my terms. What I really want is the idea of *hierarchical* enumerations. That is, the base class defines "success" and "failure", but no details. But this is precisely what OO is good at: what I want is to think of those success and failure values as *classes*, which I can then subclass. So I want to be able to define a "subclass" of the enumeration that adds specializations to the existing values. So instead of tacking "Null pointer" or "Not authorized" to the end of my enumeration, I should be thinking of them as specialized values of "Failure". That follows good OO principles, and seems like it would work well.
Does anyone know any languages that formally do this? I *can* do it in Scala decently well using case classes, but that's not quite optimal: I'd prefer a syntactic construct that is more explicit about this notion of a hierarchical, extensible enumeration. But I don't think I've ever come across one.
Anyway, I need to chew on this further. It's directly relevant to Querki, because I want to add a formal concept of enumerations there, and it needs to be exceptionally rich in order to cope with the use cases. (Since Querki is going to be used for world-building, I need a lot of schema flexibility.) I may well apply this concept there...
(no subject)
Date: 2011-04-06 06:33 pm (UTC)(enum Fred Success Failure)
(enum (Barney Fred) (Created Success) (Fetched Success) (DoesNotExist Failure) (NotPermitted Failure))
(no subject)
Date: 2011-04-06 06:36 pm (UTC)Is it because you want returned values to be normal objects as part of a normal control flow?
In which case, I still think you can model AFTER something like the Java Exception class structure, can you not? Create a base return-code class, and have each specific need generate a sub-class.
(no subject)
Date: 2011-04-06 06:40 pm (UTC)(enum Fred Success Failure)
(enum (Barney Fred) (Created Success) (Fetched Success) (DoesNotExist Failure) (NotPermitted Failure))
...which would create classes named Fred, Barney, Success, Failure, Created, Fetched, DoesNotExist, and NotPermitted.
The interesting bit is that the enum values can actually be the classes, rather than instances. That comports with the intuition that enums should be cheap, and should be comparable for identity; and it allows a simple isinstance check to determine whether a value is a Success or a Failure.
(no subject)
Date: 2011-04-06 06:42 pm (UTC)Me like. :-)
(no subject)
Date: 2011-04-06 06:54 pm (UTC)Of course, in C++, there wouldn't be the need, because objects aren't always heap-allocated, it's easy to define operator==(), and it'd be possible to write something at least as fast as isinstance (the Python reason for using isinstance is that it should be faster, since it's implemented in C).
(no subject)
Date: 2011-04-06 08:40 pm (UTC)(I do sometimes worry about that decision, but prototype-OO tends to be optimal for representing extremely complex semantics, so it suits worldbuilding really, really well.)
(no subject)
Date: 2011-04-06 09:50 pm (UTC)In fact, the class approach takes advantage of the fact that there's a prototype mechanism lying around; it's just that the objects are called classes.
(no subject)
Date: 2011-04-06 08:36 pm (UTC)But yes: that sort of sub-classing is exactly the case class model I was describing above. (Scala does this much better than Java does.) My issue is simply that the hierarchy is rather indirect and opaque -- it lacks the concise clarity of an enumeration. So what I'm pondering here is the idea of a middle ground: something that has roughly the semantics of a class tree, but is deliberately simpler. Scala case classes *may* actually be the best you can get, but I'm not assuming that...
(no subject)
Date: 2011-04-07 12:49 am (UTC)(no subject)
Date: 2011-04-07 01:44 am (UTC)Done.
I like this. This sort of thing is really why I created Adder: to be able to use macros to add completely new language features. This took fewer than 100 lines of code, and all of it in the prelude, not the core compiler.
(no subject)
Date: 2011-04-07 03:33 am (UTC)