Caterpillar 3.0 is a proof of concept app to demonstrate how incredibly useful Bayesian filtering can be when tasked with finding “interesting” articles instead of, or in addition to, spam. Download it here (10.7 Mb). It works for me. I enjoy it. and it makes my life easier. Maybe it will yours too. How does it work? Well, the simple explanation is that Caterpillar watches what you read and what you don’t read. If you click on a title Caterpillar figures it must have been at least somewhat “interesting” to you. If you click on a link in a post it figures the post’s contents were probably “interesting” too. If something sits around in your list for so long that it eventually gets purged Caterpillar figures that it’s title probably wasn’t very “interesting” to you. In a surprisingly short amount of time Caterpillar is able to start picking out articles that you’ll probably find interesting. These it will highlight in green, instead of the standard blue. That’s it. The only time you should ever have to specifically train Caterpillar is when it gets something wrong, which isn’t that often. So, if Caterpillar thinks something would be interesting to you but you really disagree just select it and choose “Downvote” from the “Entries” menu. That’s it. You can, of course, Upvote things too but you really shouldn’t need to. Who should use it? Caterpillar is good for two primary groups of people.
And, of course, randomly curious geeks. Regardless of why you’ve chosen to try Caterpillar it’s important that you remember that it is just a proof of concept and as such it may have some rough edges. What about screenshots? Visually it hasn’t changed much since the 2.0 release and there are plenty of screenshots of that on the Caterpillar 2.0 site. There’s now the highlighting of “interesting” items in green, a few more menu options, and a couple extra useful links and info when reading an article. See “Why Release it Now?” for why I don’t have updated screenshots. Requirements & Use It requires Java 5 or higher, although if you have some pressing need, and you’re a geek, you could compile it under Java 1.4.x and it should work. To run it just double click on the Caterpillar.jar file in their GUI and all should work. OS X users should be able to just double click on the pretty icon. You can also use the command line, change to the Caterpillar directory and type: java -jar Caterpillar.jar Give it a week. If you read a crazy number of feeds like I do you’ll probably see Caterpillar start picking out new entries for you by the end of your first day. If you’re like most people it may take a bit longer. The more you use it, the faster it learns. It’s that simple. Just don’t click on random entries in hopes that it will help. It won’t. Just read the entries you would normally read and let it learn what’s really interesting to you. How do I import feeds from my current aggregator? Most feed aggregators will allow you to export your list of feeds as an OPML file. Rename that file exportedFeeds.opml and place it in the same directory as the Caterpillar.jar file. Then choose “Import Feeds” from the file menu and give it a while to go and download them. If you started it from the command line you’ll see it say calling out the names of the feeds it’s importing as it goes. What about a manual? The Caterpillar 2.0 site has the Caterpillar 2.0 docs which cover basically everything except the Bayesian learning stuff which I just covered in the “How does it work?” section above. See “Why Release it Now?” for why I don’t have an updated manual. So what do I mean by “proof of concept”? Well, Caterpillar’s a good app. I use it every day. But it’s got some limitations, and right now I just don’t have the time to fix them. So here they are in no particular order:
Why release it now? Or, more to the point, why release it in an unfinished state? Well, I had intended to finish polishing it up and release it as a commercial product. But, that was over two years ago and there have just been too many other projects on my plate that are more important to me. I’d rather see people get some use out of it than have it continue to sit on my computer benefiting no-one but me. I’m also hoping that some smart programmer at Google will see the value of positive Bayesian filtering and apply it to Google Reader and Gmail. Just imagine how awesome it would be if all those mailing lists you subscribe to had a filter looking for “interesting” posts for you so that you didn’t have to read everything or feel so overloaded that you end up reading nothing. Wanna help? If you’re a Java geek feel free to download the source with Darcs from http://caterpillar.masukomi.org/code/caterpillar3 Tweak it however you want, add whatever feature you want, use the send feature of Darcs to send me a patch file (masukomi at masukomi dot org) and I’ll probably add it in. I figure any forward motion in Caterpillar is good at this point. The only restriction being it needs to have a unit test with it. Yes, I know, it seems hypocritical in light of the utter lack of tests in Caterpillar’s source but since I wrote it I got the full-on testing religion so… deal :P . To build Caterpillar just switch into the build directory and run ant. Bugs & Feature requests Report a Caterpillar bug Request a Caterpillar feature License & Copyright The Caterpillar feed aggregator version 3.0 is copyright 2007 Kate Rhodes (masukomi at masukomi dot org) and is released under the GPL v2.0. Have fun. Don’t blow anything up. Convince your rich company that they should buy the source from me so that they can sell it under any license they want. Or hire me and pay me a decent salary. Either / or…
In this industry stagnation == death. If you’re not staying on top if new technology you may as well quit because your skills will become obsolete and your job with them. Feed aggregators are a godsend for people like us. There’s no way I could keep up with my 310 (and counting) subscriptions if I had to go to their sites manually. Now, to be honest, a fair number of them are purely entertainment, and some just help me keep up with my friends lives. But, the vast majority are programming related. I even went so far as to write my own feed reader with Bayesian filtering to highlight the “interesting” posts for me, something I’d LOVE to see in a commercial product. It makes SUCH a difference, and unlike spam, there are ways of doing positive filtering that never involve having to specifically tell the application what you do and don’t like. All of which, is a really long winded way of putting up a link to what I’m currently reading for those who actually care. Although at 310 subscriptions I think the list has probably exceeded anything resembling a size that’s useful to others. You can’t import it because you’ll flood your feed reader with crap you don’t care about and 770 lines of XML is a bit much to try and read…
Update: I just came across a similar post by raganwald wherein he discusses the need for advanced programming skills…which you don’t get without math. ;)
When I was in high school no-one ever convinced me of why math was important and that is my biggest educational regret. Children, and adults for that matter, will neither seek out, nor retain, knowledge they don’t value. It’s all well and good to tell them algebra is important but unless you show them WHY algebra is important they will have no reason to retain it. I was one of those kids who grasped geometry without problem, because the practical application of it was inherent in it’s teaching. Algebra, on the other hand, was a series of essentially random numbers written on a board. They taught you how but never really dwelled on why. I left high school honestly believing that I’d never need it because no-one ever gave me reason to believe otherwise. Sixteen years later I know why. I’ve known why for a while now, but as I never really grasped, or retained, how I’ve been stymied in my forward progress. You see, when it comes to programming? All the cool shit requires math. Oh yeah, you can build nifty, and useful, apps without any notable math skills. You’ll be able to rework old ideas in new ways. It’s a lot like being a carpenter without any electrical engineering skills. Sure you can use the belt sander, and you may have some good ideas on how to make a better belt sander, but without a good understanding of electrical engineering you’ll never be able to implement those ideas and actually make that better belt sander. You may be happy being a simple carpenter, but I’ve been nailing boards together long enough to have some ideas on how to do it better and I’ve never been a complacent person. I’m hunting down math skills and eating them whole. Tonight’s lesson? Untyped lambda calculus. You need to learn it too? Start with the Alligator Eggs game. It’s a sneaky way to introduce it to young children, although, as it mentions near the bottom, it could use some expanding upon for further clarity. Then read this introduction to lambda calculus (pdf). If some of the things in the paper are missing the why aspect try and match what they’re saying with the rules you learned in Alligator Eggs. I still don’t know why the alligators die but at least now I know how, when, and where to kill them. And, to any youngsters who may be reading this and wondering why I actually need lambda calculus, the answer is “functional” programming languages like Haskell and Erlang. You may be able to go through the motions, and follow the rules of functional programming but you’ll never really understand the why of it if you don’t understand lambda calculus.
I think I understand why people tend to not write tests. Because they believe that tests aren’t something that’s either needed or important. “Duh,” I hear you say, but bear with me. Why don’t people believe that tests are something that’s either needed or important? Well, I think one of the biggest contributing factors to WHY is that essentially zero of the learn to program in language FOO books ever mention unit testing. Unit testing has been around in a formal sense since the creation of SUnit back in 1994 ! 1994 I say! That’s thirteen years now. Thirteen years and I could probably count on one hand the number of introductory language, or language reference, books that not only mention unit tests but actually explain why their important and how to use them. Even worse, most languages don’t have unit tests tools built into their core libraries. All the modern languages have fairly comprehensive test coverage but they have to use external tools to write those tests. How crazy is that? We have this common programming task that we all agree is critical to releasing a stable version of the language but it’s not important enough to build into the language. Wha?!?! The end result is that since we don’t teach tests as being even noteworthy when teaching a language no-one learns that they are important. For the most part people just don’t seem to understand the value of tests until they’ve been in the industry so long that their feet are riddled with holes.[1] Mike Clark, and others, suggest writing “Learning Tests” as a way, not only to learn a new language, but as a way to accrete a repository of what you’ve learned about a language. I think this is a GREAT idea. Imagine if every book that taught a new programming language showed you not only how to do something but then followed it up with how to confirm that you didn’t screw it up by demonstrating how to write a test for it? People would start to see test writing as a standard part of the software writing process. It would be “just what you do.” Imagine the impact that including unit testing as a standard part of the learning process would have on the software industry! Sure it might take five to ten years before we started to see the results from it but wouldn’t it be worth the wait? [1] From having shot themselves in the foot on many prior occasions.
I just had a thought. Companies looking for new developers should try to only ever hire from startups and similar small team companies. Why? Becasuse people who work for small startups can’t hide. You can be reasonably certain that someone who has managed to survive for more than a few months in a small dev team puts out decent code at a good pace. Small companies just can’t afford to keep crap coders or non-producers on their payrolls. Such a simple filter. And, I think most of the time it’s an even better filter than that. Small co people tend to be more flexible and more self-motivated in my experience. I’d never really thought about that before…
I come across a lot of interesting quotes in the feeds I’m constantly reading. So, I’ve grabbed an account on Tumblr and put up a new tumblelog at http://masukomi.tumblr.com for those. So add it’s feed to your reader if you find the same kind of things interesting that I find interesting.
I think we’re seeing the same thing with Ruby on Rails as we were in some sense seing with Agile and with XP. People leave companies to work in more passionate environments. And, if they were forced to work in waterfall development processes and practices they would leave that shop to chase an XP shop or Agile evnironment. And, I’m seeing the same kind of thing. A lot of people come up to me and say “I quit my job because of Java…I quit my job because of .Net. I simply did not want to work in an environment where we used those tools anymore. I’m not passionate about those tools so I chose to take a minor position somewhere else doing Ruby on Rails. I chose to go freelance doing Ruby on Rails. And I’m much happier now.” – DHH
If you look at Mingle, the project management tool we’ve been working on… On that tool I happen to know that their test base is twice as much as their code base. So, two-thirds of the code in that product is tests, and that allows them to do quite violent things. I know that a couple of months ago they made a very fundamental change to the database scheme. I mean, we’re talking, utterly to the guts of the database scheme. And they did that and… it wasn’t even an event worth talking about. And, when they were planning to do it they were saying ‘Yeah, yeah, we’ve got to fundamentally alter the core tables in this application… Yeah we’ll do that, and it’s not a big deal.’” -Martin Fowler (paraphrased)
Most web developers will agree that unit tests are great, and some even write them…but I know very few developers who write unit tests for their JavaScript, but it’s not really their fault. Most don’t know of good unit test systems for JavaScript and / or don’t write their JavaScript in such a way that you even could test well. This means breaking all the functionality into discreet functions and objects instead of writing old-school procedural crap. There’s also the obvious problem that most of your JavaScript is tied to the browser and the current page. So how do you test stuff in the page? Well, JsUnit lets you do just that and, seeing as I’ve just added a javascript implementation to the FizzBuzz Overthink you can run over and see how to do it for your apps too. The limitation of JsUnit is that it really wants a complete file to load into the test harness but it’s bean ages since I’ve worked on a site where at least some portion of every page was dynamically loaded. Fortunately the workaround is fairly simple: grab the source to the JsUnit Test Runner ( testRunner.html ) and extract the form but leave out the the file chooser. Put all that into a template file that will be dynamically loaded into your app when you view the page as a specific user or with a magic cookie. Otherwise it’s just not there. Now every page that needs a JavaScript unit test will have a test harness down at the bottom. Click the button and see if it passes. In order to run all the tests on all the pages just have your Selenium tests also check that the unit tests run correctly. Usually you’ll want to have this be a separate test run so that you can log in as the user who gets to see the tests and because you’ll frequently want to populate your forms and such with data that WILL break things just to test that your JavaScript doesn’t blow up in people’s faces.
As I mentioned in my recent rant our industry is plagued with bad practices even though so many of us know better. A HUGE portion of this problem is that to really start, and continue, working the way we know we should requires buy-in from our managers and coworkers. And it’s not just a conceptual buy in that we need. People need to really get the religion. But, you and I both know that we can’t realistically expect the rest of the company to change everything at once. So this flow chart addresses what I think are the three biggest changes we can make that will dramatically improve our companies. Find your biggest pain point and work on that. Once you’ve got that working really well in your group and you’re confident that people are going to stick with it you can choose the next biggest pain point and address that one, remembering what things worked best to get your employees to buy into the last change. Your feedback on this one is incredibly valuable to me on this one. Even if it’s just one sentence saying what you do or don’t agree with. It’s really important to me for two reasons. 1) It’ll play a big part in my book. 2) Many of my current coworkers are frustrated and / or depressed about our current state of affairs and I’d like to try and put the advice in this chart into practice. Yes, there are some areas that could use more detail, but I’d really like to hear from you about what pieces you feel would be most helpful to expand upon. I’m afraid that this chart may just be too high level. Each of the major pain points could easily get their own graph for implementing solutions at a similar level of complexity and I think I will make those too… just not tonight. As always you can click on the image for a full sized version and / or download the Dia file too. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.