Mar. 28th, 2009

jducoeur: (Default)
Got a startling email from my ISP this morning, saying that jducoeur.org was running close to its bandwidth limits for the month, and did we want to upgrade? Now granted, it's a very old plan, and the bandwidth limits are modest (10 GB/month), but still -- the site's not exactly the center of the universe. What's going on here?

So I downloaded a log, and I'm doing some spot-checking. Far as I can tell, the problem is simply that one of my intuitive expectations about the Web is now wrong.

The thing is, our site is crammed full of old stuff. Why not? Most of it is things that were interesting briefly, but have long since turned into simple historical curiosities: old versions of the Carolingian website, wikis and websites for old LARPs, stuff like that. Since most of it is of only slight current interest, it should be taking up (cheap) disk space, but no significant bandwidth.

The problem, though, is all those searchbots. The site is *old*, reasonably well-linked, and therefore known to all the bots. So the result is that *every* searchbot appears to be hitting *every* page on the site, pretty frequently. This utterly swamps all other traffic. In the log I'm looking at, I'm finding that "Yahoo Slurp!" is the single worst offender, clicking on everything in every wiki page. But lots of engines are showing up -- some from MSN (although they're actually more measured than most) and Ask Jeeves, lots from Yanga and Baidu and searchme and other lesser engines, even a few I've never heard of like Kosmix. And of course, the expected Googlebot.

The result, of course, is that the total bandwidth for the site is an m*n*o operation, where m is the total size of the site, n is the number of bots out there, and o is the frequency with which they each re-search. Of those terms, m is the only one I control directly. The multiplication of search engines out there -- the n term -- is really the heart of the problem. When it was just Google hitting us every now and then, it was a non-issue, but with robots swaming all over now, they are dragging us down.

(I do find myself curious what fraction of the Web's total traffic is now bots. I suspect it's a small fraction of the big sites, but an enormous fraction of smaller sites like ours, which make up most of the Web.)

Not quite sure offhand what I'm going to do about this. It's not quite a crisis yet, but it's not far off: the robots are driving us to about 80% of our limits. I do want to show up on the search sites, so I'm leery of using robots.txt in too blunt a way, but I'd really like to throttle these damned things down a bit. What I don't know yet is whether there is an easy way to do so...
jducoeur: (Default)
Got a startling email from my ISP this morning, saying that jducoeur.org was running close to its bandwidth limits for the month, and did we want to upgrade? Now granted, it's a very old plan, and the bandwidth limits are modest (10 GB/month), but still -- the site's not exactly the center of the universe. What's going on here?

So I downloaded a log, and I'm doing some spot-checking. Far as I can tell, the problem is simply that one of my intuitive expectations about the Web is now wrong.

The thing is, our site is crammed full of old stuff. Why not? Most of it is things that were interesting briefly, but have long since turned into simple historical curiosities: old versions of the Carolingian website, wikis and websites for old LARPs, stuff like that. Since most of it is of only slight current interest, it should be taking up (cheap) disk space, but no significant bandwidth.

The problem, though, is all those searchbots. The site is *old*, reasonably well-linked, and therefore known to all the bots. So the result is that *every* searchbot appears to be hitting *every* page on the site, pretty frequently. This utterly swamps all other traffic. In the log I'm looking at, I'm finding that "Yahoo Slurp!" is the single worst offender, clicking on everything in every wiki page. But lots of engines are showing up -- some from MSN (although they're actually more measured than most) and Ask Jeeves, lots from Yanga and Baidu and searchme and other lesser engines, even a few I've never heard of like Kosmix. And of course, the expected Googlebot.

The result, of course, is that the total bandwidth for the site is an m*n*o operation, where m is the total size of the site, n is the number of bots out there, and o is the frequency with which they each re-search. Of those terms, m is the only one I control directly. The multiplication of search engines out there -- the n term -- is really the heart of the problem. When it was just Google hitting us every now and then, it was a non-issue, but with robots swaming all over now, they are dragging us down.

(I do find myself curious what fraction of the Web's total traffic is now bots. I suspect it's a small fraction of the big sites, but an enormous fraction of smaller sites like ours, which make up most of the Web.)

Not quite sure offhand what I'm going to do about this. It's not quite a crisis yet, but it's not far off: the robots are driving us to about 80% of our limits. I do want to show up on the search sites, so I'm leery of using robots.txt in too blunt a way, but I'd really like to throttle these damned things down a bit. What I don't know yet is whether there is an easy way to do so...
jducoeur: (Default)
Just for giggles, since I'm looking at logs anyway, let's see what is being Googled on our page. This is essentially what people found on our site via Google (and other search engines that follow the "?q=" syntax) yesterday:


There's the person looking for "making kvass", who gets Orlando's old page on the subject (under the old Carolingia site).

There's "historic games" and "period games" and "medieval games" (and "Midevil games" and "midival games") and "Elizabethan games" and "games/Top Games/Card Games", which all hit the Period Games Homepage as they should. (I'm always happy to see that page being used; I really need to keep working on maintenance.)

More specifically, there's "prime jeu de cartes", which got to the transcription Thierry Depaulis gave me about the game. (It's a French document about near-period Primero in Lyon; one of these days, I need to get that properly translated.) And a search for "Thierry Depaulis" turned up my summary of the Tablero debacle, wherein he pointed out that one of the SCA's favorite games was historical nonsense.

"Rithmomachia" (asking from Italy) got to Peter Mebben's writeup of Rhythmomachy, which is probably one of the best answers to that particular query.

"Modar tarot" I recognize -- it is spinning off of a discussion on the Historical Games mailing list, where we are comparing reconstructions of Tarocchi. (Modar is another well-known games guy, and most of his work is reasonably sensible, but his reconstruction of Tarok is just *strange*; I'm really not sure where it came from.)

The search for "Dolce Amoroso Fuoco" gets the appropriate page in [livejournal.com profile] ladysprite's translation. (Which still needs to get redirected over to her own site.)

"SCA province" goes (for reasons I'm not too sure about) to my transcription of the original SCA Corpora -- presumably because it's the earliest official reference to "province" as a concept.

There's "how long can you store homemade beef jerky", which got followed to Mara's recipe for the same. So did "how much salt cure beef jerky".

"Resume of a librarian" goes, appropriately, to [livejournal.com profile] msmemory's resume.

Amusingly, two different people appear to have queried "recipe template" -- which does get our homebrew recipe template file, which I don't even really think of as a page, just a file sitting in that directory for our own use.

In perhaps the best example of "Yes, we have exactly what you're looking for", someone looked for "flourless chocolate cake bon appetit" and got [livejournal.com profile] msmemory's writeup of their recipe. (Which we'd stored away so as not to lose it.)

"Ladies red hankerchiefs" (sic) went to the AS XVI entry of the History of Felding. No idea why -- I may have to look that one up out of curiosity.

The search for "spicy cabage salad" went to our recipe for Cabbage Salad, that I didn't even remember we had. Yay (once again) for search engines that correct spelling. And "fish marsala" went to the recipe that I invented one day on the "this oughta work" principle, which I really must make again one of these days.


Overall, a pretty good days' searches. It's good to know that, if all those search engines are going to keep slamming us, at least they're doing their jobs in helping people find reasonably useful stuff on our site...
jducoeur: (Default)
Just for giggles, since I'm looking at logs anyway, let's see what is being Googled on our page. This is essentially what people found on our site via Google (and other search engines that follow the "?q=" syntax) yesterday:


There's the person looking for "making kvass", who gets Orlando's old page on the subject (under the old Carolingia site).

There's "historic games" and "period games" and "medieval games" (and "Midevil games" and "midival games") and "Elizabethan games" and "games/Top Games/Card Games", which all hit the Period Games Homepage as they should. (I'm always happy to see that page being used; I really need to keep working on maintenance.)

More specifically, there's "prime jeu de cartes", which got to the transcription Thierry Depaulis gave me about the game. (It's a French document about near-period Primero in Lyon; one of these days, I need to get that properly translated.) And a search for "Thierry Depaulis" turned up my summary of the Tablero debacle, wherein he pointed out that one of the SCA's favorite games was historical nonsense.

"Rithmomachia" (asking from Italy) got to Peter Mebben's writeup of Rhythmomachy, which is probably one of the best answers to that particular query.

"Modar tarot" I recognize -- it is spinning off of a discussion on the Historical Games mailing list, where we are comparing reconstructions of Tarocchi. (Modar is another well-known games guy, and most of his work is reasonably sensible, but his reconstruction of Tarok is just *strange*; I'm really not sure where it came from.)

The search for "Dolce Amoroso Fuoco" gets the appropriate page in [livejournal.com profile] ladysprite's translation. (Which still needs to get redirected over to her own site.)

"SCA province" goes (for reasons I'm not too sure about) to my transcription of the original SCA Corpora -- presumably because it's the earliest official reference to "province" as a concept.

There's "how long can you store homemade beef jerky", which got followed to Mara's recipe for the same. So did "how much salt cure beef jerky".

"Resume of a librarian" goes, appropriately, to [livejournal.com profile] msmemory's resume.

Amusingly, two different people appear to have queried "recipe template" -- which does get our homebrew recipe template file, which I don't even really think of as a page, just a file sitting in that directory for our own use.

In perhaps the best example of "Yes, we have exactly what you're looking for", someone looked for "flourless chocolate cake bon appetit" and got [livejournal.com profile] msmemory's writeup of their recipe. (Which we'd stored away so as not to lose it.)

"Ladies red hankerchiefs" (sic) went to the AS XVI entry of the History of Felding. No idea why -- I may have to look that one up out of curiosity.

The search for "spicy cabage salad" went to our recipe for Cabbage Salad, that I didn't even remember we had. Yay (once again) for search engines that correct spelling. And "fish marsala" went to the recipe that I invented one day on the "this oughta work" principle, which I really must make again one of these days.


Overall, a pretty good days' searches. It's good to know that, if all those search engines are going to keep slamming us, at least they're doing their jobs in helping people find reasonably useful stuff on our site...
jducoeur: (Default)
Just got a fascinating email -- a come-on for a job, which isn't that unusual, but clearly bogus. They claim to have gotten my resume from "the job site"; they are only willing to communicate via email; the email address is gmail; they never actually quite say what the company *does* in any detail ("the selling and purchasing of certificates of metal", which *could* be commodity trading but is a bit ambiguous); etc.

But it's moderately well-executed, with fewer English blunders than usual and without the usual obvious hard-sell. It doesn't even quite trip the "too good to be true" meters -- the salary they offer is just about right for me, which I suspect is pure coincidence, but they're not offering a million bucks. It's still clearly a scam, but not nearly as clumsy as most of them.

I do find myself curious what they're up to. Maybe they need your bank information to set up the direct deposit? Or some kind of criminal scam, and they're looking for some poor schmuck to be left holding the bag? Ah -- here we go: a thread about various permutations of this, under various names, including a couple of messages from some poor guy who did sign up. Sounds like they are using people for money laundering of fraudulent money transfers...
jducoeur: (Default)
Just got a fascinating email -- a come-on for a job, which isn't that unusual, but clearly bogus. They claim to have gotten my resume from "the job site"; they are only willing to communicate via email; the email address is gmail; they never actually quite say what the company *does* in any detail ("the selling and purchasing of certificates of metal", which *could* be commodity trading but is a bit ambiguous); etc.

But it's moderately well-executed, with fewer English blunders than usual and without the usual obvious hard-sell. It doesn't even quite trip the "too good to be true" meters -- the salary they offer is just about right for me, which I suspect is pure coincidence, but they're not offering a million bucks. It's still clearly a scam, but not nearly as clumsy as most of them.

I do find myself curious what they're up to. Maybe they need your bank information to set up the direct deposit? Or some kind of criminal scam, and they're looking for some poor schmuck to be left holding the bag? Ah -- here we go: a thread about various permutations of this, under various names, including a couple of messages from some poor guy who did sign up. Sounds like they are using people for money laundering of fraudulent money transfers...

Profile

jducoeur: (Default)
jducoeur

July 2025

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27 28293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags