Querki: what I'm trying to accomplish
Jun. 14th, 2012 11:12 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
As described in my last post, I'm thinking about seriously diving into the Querki project, probably starting part-time after Pennsic, then maybe ramping up to full-time in October if it looks like it's a business idea worth pursuing. And as I do, I'm likely to be looking for interest and assistance of many kinds.
The project is, frankly, scary as hell. In part, that's because the idea isn't as unique as it was when I came up with ProWiki ten years ago -- both XWiki and Twiki have gone down somewhat similar paths, and have a serious foothold in the enterprise market.
But the thing is, I'm not *going* for the enterprise market. There's a huge market out there that is currently poorly-served: people who just want to keep track of *stuff*.
This shows up in a thousand places -- the infinite little websites that get built for special purposes, each its own little special snowflake. Hell, just within Carolingia in the past year we've built at least two of these: the new Carolingian Site DB, and the Cooks Guild Recipe DB. Both are functional, but both took more work to assemble than they should have, and both are kind of limited. And I find myself going, "We could do *so* much better than this".
So the notion is to focus on that market: the many people who just want an easy way to build little specialty sites for simple small databases. Whereas XWiki focuses on power, Querki focuses on ease of use. It's not about building huge enterprise databases, it's about making it Really Really Easy to build little databases of hundreds or thousands of Things. It's an online database for the consumer market, for the people who wouldn't normally even thinking about building a "database". (Indeed, I may deliberately avoid that word in public.)
And yes, I know that there are lots of cloud-based DB systems out there. Suffice it to say, I'm trying something quite different in some critical details: a prototype-styled OO DB instead of a conventional relational one. All my experience says that that is *way* easier for many real-world problems, so long as you don't care about scalability, and it fits nicely in a loosely-structured wiki environment.
So please forgive the burblage to come. This could prove to be a brief phase -- a week's enthusiasm that then burns out -- but it doesn't feel like it. I think I'm onto something here, and step one is going to be proving it to my friends.
Specifically: once it is at least basically up and running, I'm going to be looking to put projects onto it. I'm going to ask y'all to think about projects -- those things that you've built little sites for, or hacked in a third-party tool, and would like to do better. In some cases, I may ask if I can try replicating an existing site, and I won't kid you: my agenda is going to be to demonstrate that I can build something that is both *better* and *easier* than what you already have. I'm going to ask for honest criticism about any shortcomings you find, especially about anything existing systems can do that Querki can't. My hope is that I can prove that Querki is just plain better for 80% of the online-data problems you need to solve.
I'm also going to be looking for technical input. This time around, I'm going to try to avoid the go-it-myself of CommYou (one of the dumber mistakes I've ever made), and instead go for radical transparency, with a fully open-source project. That's a tad scary, but enough systems have demonstrated that you can build a good cloud-based solution that is completely open source that I'm inclined to give it a try. So if you're interested in participating in a really deep technical project (all the way down to language design), comment here and we can all talk about how we'd like to set it up. (In the long run, of course, I want to run the project via Querki itself, but for the first few months we're going to need some third-party project tools to communicate.) I would dearly love to get a couple dozen technically-inclined friends involved in the discussion. Those who want to actually get their hands dirty in the code would be more than welcome, but I'd also like folks who just want to muse on the architecture, the use cases, the usability and so on.
I think it's time to change the world, just a little bit. With some help, I think we just might be able to do so...
The project is, frankly, scary as hell. In part, that's because the idea isn't as unique as it was when I came up with ProWiki ten years ago -- both XWiki and Twiki have gone down somewhat similar paths, and have a serious foothold in the enterprise market.
But the thing is, I'm not *going* for the enterprise market. There's a huge market out there that is currently poorly-served: people who just want to keep track of *stuff*.
This shows up in a thousand places -- the infinite little websites that get built for special purposes, each its own little special snowflake. Hell, just within Carolingia in the past year we've built at least two of these: the new Carolingian Site DB, and the Cooks Guild Recipe DB. Both are functional, but both took more work to assemble than they should have, and both are kind of limited. And I find myself going, "We could do *so* much better than this".
So the notion is to focus on that market: the many people who just want an easy way to build little specialty sites for simple small databases. Whereas XWiki focuses on power, Querki focuses on ease of use. It's not about building huge enterprise databases, it's about making it Really Really Easy to build little databases of hundreds or thousands of Things. It's an online database for the consumer market, for the people who wouldn't normally even thinking about building a "database". (Indeed, I may deliberately avoid that word in public.)
And yes, I know that there are lots of cloud-based DB systems out there. Suffice it to say, I'm trying something quite different in some critical details: a prototype-styled OO DB instead of a conventional relational one. All my experience says that that is *way* easier for many real-world problems, so long as you don't care about scalability, and it fits nicely in a loosely-structured wiki environment.
So please forgive the burblage to come. This could prove to be a brief phase -- a week's enthusiasm that then burns out -- but it doesn't feel like it. I think I'm onto something here, and step one is going to be proving it to my friends.
Specifically: once it is at least basically up and running, I'm going to be looking to put projects onto it. I'm going to ask y'all to think about projects -- those things that you've built little sites for, or hacked in a third-party tool, and would like to do better. In some cases, I may ask if I can try replicating an existing site, and I won't kid you: my agenda is going to be to demonstrate that I can build something that is both *better* and *easier* than what you already have. I'm going to ask for honest criticism about any shortcomings you find, especially about anything existing systems can do that Querki can't. My hope is that I can prove that Querki is just plain better for 80% of the online-data problems you need to solve.
I'm also going to be looking for technical input. This time around, I'm going to try to avoid the go-it-myself of CommYou (one of the dumber mistakes I've ever made), and instead go for radical transparency, with a fully open-source project. That's a tad scary, but enough systems have demonstrated that you can build a good cloud-based solution that is completely open source that I'm inclined to give it a try. So if you're interested in participating in a really deep technical project (all the way down to language design), comment here and we can all talk about how we'd like to set it up. (In the long run, of course, I want to run the project via Querki itself, but for the first few months we're going to need some third-party project tools to communicate.) I would dearly love to get a couple dozen technically-inclined friends involved in the discussion. Those who want to actually get their hands dirty in the code would be more than welcome, but I'd also like folks who just want to muse on the architecture, the use cases, the usability and so on.
I think it's time to change the world, just a little bit. With some help, I think we just might be able to do so...
(no subject)
Date: 2012-06-14 03:15 pm (UTC)(no subject)
Date: 2012-06-14 04:43 pm (UTC)I should note, BTW, that I'm actively using the OP as a Use Case. I don't currently intend to really build it out fully (since we already have a system that looks pretty good), but it makes a *great* example of what a really complicated data model looks like. It's more complex than what I'm primarily targeting here, but I want to make sure the system is *able* to deal with something like that: not very many records, but massively normalized and complex.
So I'll likely try to mock it up at some point, as a sanity-check. If Querki can do at least a passable job with the OP, then it's probably ready for prime time in terms of data model.
(no subject)
Date: 2012-06-14 03:17 pm (UTC)I invite you to look at "the barbershop tool", groupanizer.com
I'd be glad to give you my account name and password, if you really want to fool around, as long as you don't make serious changes.
(no subject)
Date: 2012-06-14 03:19 pm (UTC)(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
Date: 2012-06-14 04:39 pm (UTC)(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
Date: 2012-06-14 03:24 pm (UTC)Oh, yes, please. I know what I want vs. what we currently have. Let me know how I can help.
(no subject)
Date: 2012-06-14 04:51 pm (UTC)(no subject)
Date: 2012-06-14 03:54 pm (UTC)(no subject)
Date: 2012-06-14 04:54 pm (UTC)I have some vague hand-wavy notions about how this would work, and my own LARP stuff will be an early use case for that, but we should chat about your wiki and how we might migrate it -- it'll provide me with one useful example.
(And in the medium term, we might want to explore lifting out a general Recipes Schema, given that both you and the Cooks Guild will probably be early users. I am a *great* believer in factoring out the common bits.)
(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
Date: 2012-06-14 04:32 pm (UTC)(no subject)
Date: 2012-06-14 04:54 pm (UTC)(no subject)
Date: 2012-06-14 06:02 pm (UTC)(no subject)
Date: 2012-06-14 08:52 pm (UTC)(no subject)
Date: 2012-06-14 07:39 pm (UTC)My most recent interesting use-case was trying to figure out the publication order of the B.P.R.D. comics. Looking for lists of "Written by Mike Mignola" doesn't cover all the cases, nor is there any single character who appears in all issues. And what gets printed in which collections, and where/when did those stories originally appear?
(no subject)
Date: 2012-06-14 09:07 pm (UTC)Yeah, I've been pondering that as well. I have my own system at home, based on Rails and MySQL, and I'm slightly loathe to transition because I have *so* damned much data in it. But I suspect that, if Querki gets anywhere near where I want it, it's just going to be much easier to use. So I've just added that to my Use Cases: it's an interesting medium-complexity example with some intriguing data-entry problems. (Eg, I want to be able to say "issues 34-92" as a single input, and have it generate all the skeleton records automatically, for me to annotate later or not.)
And yeah, the BPRD and Hellboy are an interesting one. I'd have to think about what properties I'd need to even figure that one out. But it does nicely illustrate the benefits of the flexible prototype-based model: I could subclass "Issue" with an extra "StoryOrder" field for these books, so the system could at least have some hope of tracking it. (I don't know of any existing system, including my own, capable of tracking both the "BPRD: Trail of the FooBar #3" and the little "Issue #46 of BPRD" in the indicia.)
And that, in turn, points up a feature I need to think about: reparenting. I'll occasionally want to take an existing set of Foos and turn them into SubFoos quickly and easily, as I much with my data. Fun...
(no subject)
Date: 2012-06-14 09:11 pm (UTC)So that's an interesting general use-case for me: auto-generation of a site based on existing data, which I then push and pull into working the way I want it. I'd bet that that would be really useful in the general case...
(no subject)
Date: 2012-06-14 09:24 pm (UTC)I've been seeing stuff lately about MongoDB (http://www.mongodb.org/display/DOCS/Introduction), but I haven't had a chance to look at it in depth. From a cursory look, it seems to have some overlap in the "unstructured, non-relational" aspect, but then emphasizing performance and scalability rather than ease of use. Might be worth a look, though.
(no subject)
Date: 2012-06-14 09:32 pm (UTC)(no subject)
Date: 2012-06-15 12:30 am (UTC)(no subject)
Date: 2012-06-15 01:03 pm (UTC)It sounds like this might drive some interesting features, along the "people management" lines that Tibor and I have been discussing elsethread -- it's a good collaborative example, coordinating a bunch of people working together, and I'm always fond of problems like that.
So let's talk about this further over the next couple of months, and figure out the design and requirements. It sounds like it probably won't be one of the *first* projects, but it might make a good semi-advanced one once the underpinnings are all in place. (Ie, sometime in the fall/winter.)
(no subject)
Date: 2012-06-15 02:55 am (UTC)(no subject)
Date: 2012-06-15 01:54 pm (UTC)(no subject)
Date: 2012-06-15 03:43 am (UTC)The group doesn't exist any more, but if that use case is interesting I'd be happy to talk with you more about it. I imagine it could apply to other "team + configurable tasks" cases, like, say, household chore management.
In a different vein, perhaps there is some way I can help you think through your configuration language. I have a bit of a track record beating up on API designs, for what that's worth.
(no subject)
Date: 2012-06-15 01:56 pm (UTC)(no subject)
From:(no subject)
Date: 2012-06-15 09:08 am (UTC)(no subject)
Date: 2012-06-15 01:57 pm (UTC)(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
Date: 2012-06-15 06:50 pm (UTC)An attacker sees this and looks under the hood. He notices the article number gets sent to the server. Is that used directly in an sql query, he wonders, and goes and tries various things such as aununlisted article number, or backtick or 1=1' or what have you.
A smart programmer has instead coded the page to return 'selected item = 3' and the application to both know that only three choices were presented, and what the associated article id is for 3, and to reject any input except plain digits. And a few other things.
Now, if the data persistence framework had a call to say 'give me a list of available choices' and 'act thusly on choice 3' we'd live in a world with a lot less sql injection attacks.
(no subject)
Date: 2012-06-15 07:21 pm (UTC)Sure -- I generally assume that the user is malicious, and is expert in not only the web traffic but the code. Remember that my job for the past three years has involved sensitive bank data, at a company that built its reputation on detecting employee fraud. I assume that even *trusted* users -- anybody below the superuser level -- are trying to attack. Certainly the poorly-paid analysts who are using my systems are.
And since I'm expecting Querki to be open-source, I have to assume that an attacker is working with full knowledge about how the system works. So I'm not concerned about him inspecting the traffic -- I'm concerned about him reading the source code and finding a hole.
Anyway -- SQL injection is a relatively minor concern here, since the system mostly operates in-memory. I'm actually much less concerned about people trying to attack that way from queries, and more from the actual submitted page data. At this point, I'm not anticipating any user-level queries that translate directly into SQL statements. It's worth paying attention to, of course, but at this point I don't think it's likely to be a front-and-center worry.
(OTOH, this system is going to be ripe for XSS attacks, so *those* I have to be really careful about.)
In practice, for the case you're describing, I believe that it usually isn't going to be a query operation so much as a RESTful fetch. Of course, we need to make sure that we deal properly with badly-formed URLs, but that's a necessity for many reasons...
(no subject)
Date: 2012-06-15 08:32 pm (UTC)(no subject)
Date: 2012-06-15 08:42 pm (UTC)But in general, they won't be coding anywhere near the SQL level. Persistence is currently designed as primarily a backing store, with operations happening in-memory in the Actor architecture. So operations basically are method invocations on specific objects; persistence is hidden behind that OO layer.
Hence, this is why I'm more worried about data than commands. The only *likely* avenue for a SQL attack is in the data you are passing in update parameters, which will usually be updating a page property. So long as we get that properly locked down, there may not be any other ways into the DB. (In general, I want all DB access to be *very* bottlenecked in the code, for this among other reasons.)
(no subject)
Date: 2012-06-16 03:29 pm (UTC)A specific use case would be to be able to purchase and rate a DVD from Amazon, and have it be added to my wiki's 'DVDs I Own' and 'Movie Ratings' wiki pages, with a auto-generated stub page with plot scraped from imdb, some basic rating information, etc. and keeping the amazon rating in sync with Netflix, Hulu, etc.; I'd like to do the same thing with books (goodreads), board/card/euro games (bourdgamegeek.com), or pretty much anything else.
Realistically, even just having the wiki gather the data, rather than push updates, would go a long way to my wants and needs.
I know I can hack something together to run on top of any wiki, with enough time and effort, but I'd love a wiki designed around the idea that inter-site communication is part of the data back-end.
(no subject)
Date: 2012-06-16 03:30 pm (UTC)(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From: