jducoeur: (Default)
For those who are interested in it: I'm experimenting with posting some of my technically-focused articles on Medium. The first one is up: Don't hand out masks of your own face.

Don't worry, I'm not abandoning Dreamwidth -- most stuff will remain here. But I'm going to play with using Medium as a professional/technical blog, for articles where I am spouting off on techie subjects, and don't care quite as much about promoting followup conversation (which works better here).

So if you're interested in the programmer-y stuff, I encourage you to follow me on Medium, and we'll see where that goes...
jducoeur: (Default)

Since my Google-fu is failing me, I'm curious whether any of my friends might know:

I have one high-number port open on our home network, gatewayed to HTTP on my development machine, which is sometimes running an in-development HTTP server. (Sometimes Querki, sometimes other things.) Unsurprisingly, this leads to port scanners trying to break in; if I happen to be running the application at the time, I see fun errors in the log.

(No, there's nothing secret or interesting in the exposed web server -- it's just test data, and the open port is so that I can show folks outside the firewall what I'm currently up to. And if somebody actually can break into it through that, I want to know about that now, on my Linux dev box, rather than in production.)

This morning's errors are a mystery to me, though -- it looks like somebody is attempting to issue a REMOTE command. It's splashing with a "501 Not Implemented", of course, but I have no clue what it is. I had originally been entirely puzzled, since I'm not aware of a REMOTE method in HTTP, but then it occurred to me that, since this isn't port 80 or 443, there's no reason to believe they're trying to attack me with HTTP.

Any ideas what protocol they're sniffing for? This is just idle curiosity, but I like to have some idea how someone is trying to attack me, and there seems to be an automated probe trying this one about once an hour...

jducoeur: (querki)

(Only interesting to programmers, and this time really only interesting to folks who actually build front end pages. But really interesting for those of us who do that.)

Okay, I'm probably late to the party here, and the serious front-end people already knew about it, but last week Otto Chrons (one of my fellow Scala.js geeks) happened to point to an article that mentioned Chrome's current Performance tab. So I took a look, and found it downright revelatory. That turned into the focus of this weekend's Querki release.

If you have Chrome, go check it out: Inspect a page, go to the Performance tab, and reload the page. Poof, you are presented with a wonderland of data. Here is a representative image from Querki, behind the cut tag. (Ignore the nonsensical function names -- that's because the code has all been optimized.)

Read more... )

At the top, it shows a summary of what the CPU was doing while you reloaded. This makes it starkly obvious when the page is just sitting there, waiting for stuff from the network. Below that is a sort of icicle view of exactly what happened when, with each task broken into sub-tasks, and sub-sub-tasks, and sub-sub-sub... you get the idea.

Honestly, it smacked me across the face: it turned out that one reason Querki's page load was so slow nowadays was that it was taking over two seconds to parse a bloody text file, because I hadn't optimized the parser properly. An hour or two of hacking on that, and I'd reduced it by over 80%. Check.

Then there is all that downtime: basically, it would completely lay out the basic page, then go fetch the big scripts, and only once those fed in did it start to process. This led to me reading into the new Preload tag. This is very new (as in, it's been around for about a year), and only supported on Chrome and Opera, so it's not a panacea -- but it does help a lot of the market. Basically, it lets you say "I am going to need this resource soon, so start loading it Right Now". If it's supported by the browser, and you have enough network connections, it starts fetching it in parallel, so the scripts can start to execute as soon as things are ready for them. That seems to shave another second or so off of load time.

Overall, it's a huge win, and the result is that Querki's initial page load is now down from averaging about 6-7 seconds on the desktop to sometimes getting as low as 2 seconds under optimal circumstances. (Among other things, this means that navigating to your index and then over to another Space is much faster than it had been.)

I haven't managed to fix everything yet: there turns out to be another fetch that is sometimes causing delays, which Preload doesn't seem to work on. (Basically, because it's an AJAX request.) That's going to need a serious rewrite, I think. But I hadn't even realized it was a concern until getting whapped upside the head by the Performance tab.

So to summarize: if you're building webpages, Chrome's Performance tab is your friend. It's dense, but chock-full of useful information to help you understand exactly what's taking how long at load time...

jducoeur: (Default)

A quick question for the web developers out there. I'm currently doing some consulting work (dayjob to bring in a proper income while Querki keep improving in the background), building a new website from scratch. As always, it needs to be modern and responsive. I'm used to using Bootstrap for this -- it's what I used for Querki -- but it's been a few years since I last took a serious look at the landscape.

So: any opinions between the available frameworks? I know Bootstrap pretty well, but I also hear about Foundation fairly often, and I'm finding another one called Skeleton that I know nothing about. Anybody have any pros/cons to express between these? Do you know any others that are particularly excellent? I'm looking less for just-the-facts (there are comparisons available online), and more for war stories, subjective opinions, and stuff that doesn't show up in the bullet lists...

jducoeur: (Default)

Fact: I slept poorly last night. (No particular reason, just restless.) Hence, I am very tired today.

Fact: I am being considerably more productive today than usual.

Theory: this seems to be mostly because I just plain don't have the energy to overthink and doubt my previous decisions, so I'm just building the system as designed.

There is a lesson in here, somewhere...

jducoeur: (Default)

A few days ago, I posted about Rust, having just watched a wonderful talk about it at Scaladays. That presentation is now online. It's highly recommended for all programmers who are interested in language design -- it's a lucid talk about the language, focused on the rationale behind it and how they achieved those goals. Exciting stuff: Rust is probably the first language since Scala that I've found really compelling, the C++ replacement to Scala's Java.

And from the same series of videos comes this talk from the creator of Jepsen -- again, nothing to do with Scala, but a great technical talk. I tweeted that this one was "Funny, educational and terrifying". (The laughter isn't much picked up by the microphone, but was pretty loud at times.) Jepsen is a toolkit for testing distributed databases, and this talk (illustrated entirely with hand-drawn slides) goes into fairly deep detail about why it's so hard to build them. The upshot is that nearly every new-fangled DB turns out to be seriously broken in at least one or two respects. A great talk for anybody who is interested in distributed systems architecture. (And anybody who is using any of these databases.)

(And yes, there was one keynote that was actually about Scala -- Martin Odersky talking about "What to Leave Implicit". Also a good talk, but mainly interesting if you already know Scala; the other two don't require as much background...)

Rust

Apr. 20th, 2017 11:30 am
jducoeur: (Default)

This week's adventure in conferencing is my first trip to ScalaDays, which is in Chicago this year. This morning's keynote was a bit surprising, because it was about the language Rust, rather than Scala. But it was a great talk, and very educational -- I've known vaguely of Rust for a while, but really hadn't known the details. Here's a summary of what I learned, but I recommend checking out the video of the talk once it comes out.

I've been a serious evangelist for Scala for a number of years -- my usual take is that it is currently the best language for general, high-level application programming. You can argue the point, but I'm confident about this one: it's a lovely mix of pragmatism, power and principle, and makes programming more efficient and safe.

But -- not all programming is high-level. Some code needs to be closer to the bare metal, for efficiency, access to the hardware, or other reasons -- it needs to be specifically low level. Scala is only now beginning to be able to do this (with the relatively new Scala-Native compiler), and it's yet to be proven in that environment. Rust, on the other hand, is designed for that world from the get-go.

Or to put it another way, Rust is to C++ as Scala is to Java: a much newer, rethought, more powerful and safe language for playing in that domain.

The core problem with low-level systems programming is that it is scary -- it is very easy to commit any of several major mistakes, each of which leads to crashes or, worse, security leaks. This is true even if you're good at this stuff: programs are complex, and the interactions between the parts are where the bugs tend to arise. Rust is all about reducing that fear, and letting you code with confidence.

The beauty of Rust is that they've taken a very principled look at where those problems tend to come from, and found a few key areas to improve. In particular, the observation is that many bugs arise from uncontrolled access to memory. Plain and simply, pointers are a problem.

So Rust's biggest innovation is removing that word "uncontrolled". It introduces a compiler-time notion of "ownership", and distinct notions of mutable references (which give a code block the right to alter that memory block) vs "shared" references (which allow you to inspect the memory). While they don't use the same terminology, the concepts appear to be quite similar to write vs read locks in database programming.

They've built a lot of infrastructure on top of that, with some really remarkable results. Perhaps most impressive, they've built a concurrency framework that manages to be both flexible and safe. Most of the standard patterns for concurrent programming exist, but they're all adjusted to this ownership-centric world, such that many of the common race-condition problems just can't arise unless you explicitly say "yes, this code is cheating -- I know what I'm doing".

It doesn't solve every problem -- I checked after the talk, and confirmed that it's totally easy to cause deadlocks (unsurprising, given how much this looks like database programming) -- but it's still beautiful and powerful. In the area of concurrent programming, Rust is arguably better than most high-level languages.

Overall, I'm impressed, and I'm pleased to see Rust being presented at a Scala conference -- it looks to me like the languages are nicely complementary. Rust isn't really in competition with Scala: it is optimized for different kinds of problems. But it is principled, and well-designed, in a way that is very reminiscent of Scala. The combination of Scala for application-level programming with Rust for systems and components provides a solid replacement for the older Java/C++ stack.

Not that I've done much systems programming in the past 15 years, so I don't know if I'm likely to use Rust any time soon. But it's good to see the rise of a language that doesn't suck for that domain. God knows, my life 20 years ago would have been much happier with it...

jducoeur: (Default)

For the relatively serious programmers, I commend the article Asynchronous Programming and Scala. It's somewhat dense stuff, and as written is entirely in Scala, but the principles are pretty generic. It's all about how to think about asynchronous programming, and makes some important high points:

  • Asynchrony is not the same thing as Parallelism, although they are closely related.
  • Callbacks are a wretched way to deal with async, since they don't really compose. (I have learned this one through much pain.)
  • Futures and Promises are less wretched, but still problematic.
  • If you really want to do this stuff right, proper functional-programming techniques rock.

Of course, this is largely a rationale and advertisement for the Monix Library, which is a more or less state of the art library for "doing it right" -- but it's a pretty compelling rationale.

None of this is easy: he's summarizing stuff that's taken me four years to really internalize. (One of my medium-term but relatively challenging goals is to rewrite the pipeline for the QL language inside Querki from being Future-centric to Monix-centric: the result would be vastly more efficient and reliable.)

But it's important material, especially if you're designing systems. I encourage you to read and absorb it. Feel free to ask me "what the heck is that bit talking about?" questions, or even questions about the syntax and functions in the examples -- I always enjoy burbling about programming in general and Scala in particular...

jducoeur: (Default)

This one's just for the programmers/architects, and mainly for the experienced ones: Things I Wish I Knew When I Started Building Reactive Systems.

The more you're used to building traditional Tomcat-plus-RDBMS applications, the weirder you're going to find this, but it's well worth reading and absorbing. It describes a few of the assumptions underlying modern, scalable, so-called "reactive" architectures, each of which gores one of the traditional sacred cows you're probably used to. What it all boils down to is that it's entirely possible to build seriously efficient, seriously scalable online services -- you just have to change a lot of well-worn habits.

(Querki is built around all of this stuff, except that I still have some blocking I/O in the MySQL code; replacing that with a better approach such as Slick is becoming an increasingly high priority.)

And this reminds me: among other things, it links to the paper Life Beyond Distributed Transactions. If you're playing at the Senior Software Engineer or above level, this is one of the most important papers of recent years, and you should read it if you haven't already done so. It was the paper that finally demonstrated that the emperor has no clothes: that the traditional transaction-oriented model of data processing doesn't scale well, and that you need better approaches if you're going to compete in the modern world.

For all that it calls itself "An Apostate's Opinion", it has become something like the new gospel. It has inspired enormous ferment and evolution over the past decade, and led to radically new architectures (such as the event-sourced approach that Querki is now mostly built on). If you are doing architecture for systems that are intended to scale, you need to understand this stuff in order to understand how the industry is evolving...

jducoeur: (device)
[Mainly for the programmers, and this time mainly for folks who have to touch web browsers.]

I'm currently catching up on old articles I've bookmarked to read later (more links may come), and I just read through this marvelous discussion of Scala.js, the Scala-to-Javascript compiler. In it, Li Haoyi (one of the first serious users of Scala.js, and one of the most important ecosystem developers) explains why Scala.js is not only one of the best ways to develop for the Web, but why he decided from very early on that it was likely to *become* one of the best.

It's a compelling argument, and after 2+ years of heavy Scala.js use, I totally agree: it's the first environment for developing this stuff that I've actually *liked*. The article is long but recommended, and I'm happy to answer any questions...
jducoeur: (Default)
It's been a long time coming, but I'm finally beginning to grok pure FP. I'm in the process of rewriting SpaceCore (one of the most dead-central Actors in modern Querki) to make the guts of all the functions pure, pulling all the side-effects out to the edges. (Not out of any sense of righteous purity, but because I need these bloody things to be composable, and it's the best way to do it.)

And I just caught myself saying, "Ah, that class is a Semigroup; I should probably instantiate that typeclass, so that I can combine the instances".

No doubt you'll find me in some alley sometime soon, mumbling about Applicatives, Free Monads and other such Cthulhoid horrors. Have pity on me...
jducoeur: (Default)
One of the interesting lessons of working on the Querki project has been to be suspicious of the phrase "user error". It's a commonly-enough heard term in programming -- dismissing a bug report on the grounds that the user just didn't read the documentation closely enough. It's often accompanied by just the *tiniest* bit of sneer (or sometimes, not so tiny), that the user is obviously a bit dim for not *getting* it.

That's *never* a good response, but in a consumer-facing application it's downright capital-B Bad. I tend to think of this as the heart of UX, or at least a major artery: if the user isn't understanding your product, your first response should be to look for problems in the product, not problems with the user.

I've been slapped upside the head with this a bunch this week: [livejournal.com profile] alexx_kay has been doing some building, and logging a pile of bugs. Some of them are simple, straightforward, ordinary bugs, but several of them have been provoking an internal monologue along the lines of, "Well, that doesn't *work* that way... but of course you *expected* it to work that way... annnnnnd your expectation is perfectly fair and consistent with the rest of the system, a clear improvement... so I guess I need to tweak things to make it work that way".

There's a lesson here, for both programmers and users. I've mentioned it before, but it always bears repeating: users who send you bug reports like this are worth their weight in gold, and the correct response to bugs like this is "thank you". Down in the trenches, it is *terribly* easy to develop tunnel-vision, and your users, especially the serious ones, often spot inconsistencies that you overlook. (Folks building stuff in Querki, *please* don't be shy about sending 'em in.)

Or, to put it more simply, people sending in bug reports are usually *doing you a favor* by doing so. Treat them accordingly, and value their input...
jducoeur: (querki)
See the latest release notes on the Querki Development Journal for details, but suffice it to say, Querki now has its own Cassandra cluster, and we're going to be transitioning most of the application data over to that (from the existing MySQL database) in the next month or two.

It feels almost trendy, but we're finally joining the NoSQL Age...
jducoeur: (Default)
Yesterday, I hit a gigantic tarball in my current Querki project. (I'm currently rewriting the underpinnings to be journal-based instead of conventionally relational, which should make the system *vastly* more reliable, and permit us to build powerful version control into it. Huge project, but the end result is going to be *sweet*.)

The details of the problem aren't too important: suffice it to say, I realized that one of the key libraries I was relying upon (Kryo) was missing a critical concept, and this was leading to my code that depended upon it getting more and more baroque, with some possible problems coming up that might be entirely unsolvable.

In traditional enterprise software, I might have "support" available to me -- that is, I could report the issue. If that got a friendly ear, my request for an enhancement might get into their issue-management system. And in a while -- a few months if I'm lucky, a couple of years if not -- I might get a good fix. That doesn't exactly work for me.

Fortunately, the library in question (like most of Querki's underpinning) was open source. Which meant that, in a few hours, I could figure out *precisely* what was going on, come up with a fix (enhancing the KryoSerializer shell library around Kryo with a new concept), fork the library, and get the solution up and running.

No, it's not "supported" by anybody, and I might wind up having to maintain my own fork if the owner of the shell library doesn't like my approach. But I managed to clear a major blocker in Querki by adding a major new feature to the underlying tool, in hours instead of months.

Really, having gotten used to a nearly-pure open-source environment for Querki, my patience for traditional closed-source software has gone *way* down. I've never been anywhere near so productive in the traditional world...
jducoeur: (Default)
[For the programmers]

Just came across this lovely little article, from a former Java programmer reflecting on having spent 5 years in Scala instead. Highly recommended to any programmers who are curious about Scala but intimidated (especially Java programmers) -- this outlines some of the key advantages of the system, and debunks a bunch of the common misconceptions about it...
jducoeur: (Default)
[For the programmers, who might want to follow these links]

God bless the Cats project -- thanks to their documentation of it, I finally grok the State monad.

On the downside, this means that I'm quickly realizing all the places that I should have been using it all along. (Indeed, it's probably the right solution to one of Querki's nastier internal flaws, a pattern that up until now has required a lot of careful manual checking.)

Seriously: Cats is neat -- a new category-theory library that doesn't assume you already know all this stuff, so they are putting at least as much emphasis on documentation as completeness. Any Scala programmer should check it out, and serious non-Scala ones might want to give it a look...
jducoeur: (Default)
I'm continuing to think about ways I could contribute to the local tech-entrepreneur scene (which led to yesterday's question about basic programming principles). It occurs to me that it might be quite useful to give a talk on "The CEO and the Successful Software Project": what the CEO (especially for a startup) needs to know about managing an Agile software project, what they need to provide to the team, and what they can and can't reasonably expect from it. A large fraction of the entrepreneurs I'm meeting don't have any formal tech background, and probably mostly don't know this stuff.

(Note that this is specifically Agile from the upper-management viewpoint, so it's all about the "API" of Agile. I love pair-programming and automated testing and all that, but they're mostly irrelevant; OTOH, the rationale for the story stack, sprints, and the customer representative are vitally important.)

So I started thinking about what I might say, and this was one of the first things that came to mind:
The Uncertainty Principle: you can fully understand your feature set or your schedule, but never both. The more precisely you try to understand one, the less confidence you can have in the other.
I believe that's a straightforward lesson from the history of software development.

That quickly led to:
The First Law of Project Motion: the more precisely you attempt to understand the full scope of the project, the more inertia you add to it, and the more slowly it will move.
I'm liking this general approach -- it makes for good, pithy slides that I can then dig into and explain *why* these are generally true.

Do folks have other suggestions along these lines? I'm curious how far we can carry this metaphor before it breaks, while helping to illuminate the realities of software projects from the management level. And more generally, what would *you* like the CEO of a small software-focused company to understand?
jducoeur: (Default)
Another day, another networking event -- I'm slowly getting used to going to all these Boston Tech Meetups and such, to meet people, talk up Querki and start to understand how one gets an investment.

Along the way, I'm chatting with lots of folks, and a remarkably large fraction lead off with, "Well, I've always been doing X, but I want to learn to code". (Last night's was a fellow who does financial compliance work for one of the large funds.) These folks are usually self-taught, and tend to be very self-deprecating about the fact that they didn't go to school so they don't *really* understand programming. A couple of the programmers I was with and I got chatting about that, and the fact that, yes, the best way to learn to program is by doing. A degree in CS is helpful, but mostly in that it teaches you some of the underlying theory for programming *well*; the nuts and bolts change so often that the details you learn in school will only be useful for a limited time anyway. Somewhere in there, I asserted that you could probably list all of the most-useful bits of theory and practice in one brief talk anyway.

So, here's a challenge: help me figure out what those are. What are the key engineering principles that *every* programmer should know, that probably aren't obvious to a newbie and which aren't necessarily going to be taught in an online "How to Java" class?


I'll start out with a few offhand:

Refactoring: great code doesn't usually come from a Beautiful Crystalline Vision that some programmer dreams up -- it comes from writing some code, getting it working, and then rearranging it to make the code *better* while it's still working. That's "refactoring": the art of making the code cleaner without changing what it's doing. It's a good habit to get into, especially because it takes practice. (Granted, listing all the major refactoring techniques is a good-sized talk itself; I highly recommend Fowler's book on the subject.)

The DRY (Don't Repeat Yourself) Principle: which I usually describe as "Duplication is the source of all evil". Any time you are duplicating code, you're making it much more likely that you'll get bugs when things change. Much of refactoring is about merging things to eliminate duplication. Similarly, duplicate data is prone to getting out of sync and causing problems, so you should usually try to point to the same data when it's convenient to do so.

Efficiency is good, but algorithmic complexity is what matters: this is what's often called "Big-O" notation in computer science. How fast things run *does* matter, but only in the grand scheme of things. Whether this approach takes twice as long as that one probably doesn't matter unless you're doing it a bazillion times per second. What *does* tend to matter, given a list of size n, is whether you're going through it just once -- O(n) in the notation -- or whether each time through you're going through the whole list again -- O(n^2) in the notation, that is, "n-squared". (You'd be surprised how easy it to to wind up with algorithms that are n^2 or even n^3 -- that can actually get slow.) Or, if you have two list m and n, does your approach take O(n+m) time, or O(n*m)? It's worth practicing thinking through these order-of-magnitude evaluations and getting an intuition for it. That said...

Big stuff swamps small stuff: in one community the other day, I pointed out an approach to solving a problem that involved creating an extra object for each HTTP call. One of the folks in the discussion asked whether that inefficiency would matter, and I had to point out that you're already handling an HTTP call -- at *best*, the overhead of that handler is at least 1000 times that extra object creation, quite likely 10000 times more, so this is a drop in the bucket. So keep scale in mind, and don't sweat the small stuff. If you know your list is never going to have more than ten entries, even O(n^3) probably doesn't matter much.


What else? Can we craft a reasonably brief Rosetta Stone that summarizes the *common* stuff that every programmer should know, so they know what to look for? What are the principles that are true regardless of programming language, which aren't necessarily taught by the average JavaScript bootcamp? DRY is the heart and soul of good programming IMO -- are there other principles of similar importance?
jducoeur: (querki)
For the who are paying a little attention to Querki (but not following [livejournal.com profile] querki_project): the first major phase of the QL language is finally finished. A little while ago, I introduced bound values, and last week I was finally driven to add local function definitions. That brings us to the point where I've implemented all of the read-time functions that are in the plans. There will likely be some enhancements as we go, to support transforming and storing data as it gets saved to the database, but the main spine of the language is in place.

It's been an interesting experience, deriving the simplest language I could come up with that suffices for this purpose. It's certainly more complex than it once was, but still -- the reasonably full language definition fits in a handful of screens. It's not *quite* as simple as Scheme, but it's well up there.

The end result is a bit surprising, with a couple of aspects that emerged organically as I developed. One is the fact that it's a very pure functional language: that wasn't an original design goal, but after a while, it became clear that there was no good reason *not* to go for pure-functional, and all the usual arguments in favor apply here. The other is the incredibly strange way Querki handles function parameters, behaving more like macros than conventional functions. I keep feeling like this *surely* must be wrong, since no other language I know works this way, but it's clearly optimal for the way Querki thinks about data.

(An open question is whether QL should be considered a DSL. It kind of is, in somewhat the same way that SQL is: the domain is "data transformation". I have trouble considering that a "domain", but there you go.)

Anyway, I've written up a first-draft guide to the language, which can be found here. I'm unlikely to change any major aspects of the language at this point -- this syntax was evolved through a lot of careful thought about how one works in Querki -- but questions and comments would be quite welcome...
jducoeur: (device)
... when your internal monologue goes something like this:

"A-ha!  Yes, that looks like the right solution to the problem."

(Smug.)  "Oh, I like that -- it's pretty innovative, and I think it's even a good user workflow."

(Dismay.)  "Oh, crap -- that means I probably have to write an effing patent..."
ETA: folks, I appreciate that you're trying to help with the comments, but you're not -- you're making an extremely difficult and painful decision much worse. I've been studying this question at *least* as long as any of you, I understand it quite deeply from all sides, and quite frankly, you're not in my shoes and don't understand the sheer number of issues I'm juggling here. Please stop.

Profile

jducoeur: (Default)
jducoeur

June 2017

S M T W T F S
     123
456 7 8 910
11121314151617
18 192021222324
2526 2728 2930 

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags