weblog.masukomi.org

mah-soo-koh-me

 

Caterpillar 3.0 Release May 30, 2007

Filed under: Uncategorized — masukomi @ 2:55 am

Caterpillar 3.0 is a
proof of concept app to
demonstrate how incredibly useful Bayesian filtering can be when tasked with
finding “interesting” articles instead of, or in addition to, spam.
Download
it here
(10.7 Mb). It works for me. I enjoy it. and it makes my life easier.
Maybe it will yours too.



How does it work?

Well, the simple explanation is that Caterpillar watches what you read and what
you don’t read. If you click on a title Caterpillar figures it must have been at
least somewhat “interesting” to you. If you click on a link in a post it figures
the post’s contents were probably “interesting” too. If something sits around in
your list for so long that it eventually gets purged Caterpillar figures that
it’s title probably wasn’t very “interesting” to you.



In a surprisingly short amount of time Caterpillar is able to start picking out
articles that you’ll probably find interesting. These it will highlight in
green, instead of the standard blue. That’s it. The only time you should ever
have to specifically train Caterpillar is when it gets something wrong, which
isn’t that often. So, if Caterpillar thinks something would be interesting to
you but you really disagree just select it and choose “Downvote” from the
“Entries” menu. That’s it. You can, of course, Upvote things too but you really
shouldn’t need to.



Who should use it?

Caterpillar is good for two primary groups of people.

  1. People with way too many subscriptions and not enough time or energy to sift
    through them all for the posts that are actually worth reading.
  2. People who want a simple user interface for their feed reader.

And, of course, randomly curious geeks. Regardless of why you’ve chosen to try
Caterpillar it’s important that you remember that it is just a proof of concept
and as such it may have some rough edges.



What about screenshots?

Visually it hasn’t changed much since the 2.0 release and there are plenty of
screenshots of that on
the
Caterpillar 2.0 site
. There’s now the highlighting of “interesting” items in
green, a few more menu options, and a couple extra useful links and info when
reading an article. See “Why Release it Now?” for why I don’t have updated
screenshots.



Requirements & Use

It requires Java 5 or higher,
although if you have some pressing need, and you’re a geek, you could compile it
under Java 1.4.x and it should work.

To run it just double click on the Caterpillar.jar file in their GUI and all
should work. OS X users should be able to
just double click on the pretty icon.

You can also use the command line, change to the Caterpillar directory and type:
java -jar Caterpillar.jar



Give it a week.

If you read a crazy number of feeds like I do you’ll probably see Caterpillar
start picking out new entries for you by the end of your first day. If you’re
like most people it may take a bit longer. The more you use it, the faster it
learns. It’s that simple. Just don’t click on random entries in hopes that it
will help. It won’t. Just read the entries you would normally read and let it
learn what’s really interesting to you.



How do I import feeds from my current
aggregator?


Most feed aggregators will allow you to export your list of feeds as an OPML
file. Rename that file exportedFeeds.opml and place it in the same directory as
the Caterpillar.jar file. Then choose “Import Feeds” from the file menu and give
it a while to go and download them. If you started it from the command line
you’ll see it say calling out the names of the feeds it’s importing as it goes.



What about a manual?

The
Caterpillar 2.0 site
has the Caterpillar 2.0 docs which cover basically
everything except the Bayesian learning stuff which I just covered in the “How
does it work?” section above. See “Why Release it Now?” for why I don’t have an
updated manual.



So what do I mean by  “proof of concept”?

Well, Caterpillar’s a good app. I use it every day. But it’s got some
limitations, and right now I just don’t have the time to fix them. So here they
are in no particular order:

  • Right now Caterpillar’s XML parsing is too strict. It only tolerates well
    formed XML but, unfortunately, there are a lot of blog posts out there tha
    don’t have their contents properly encoded. If it encounters a poorly formed
    feed you’ll get an error dialog that says “There was a problem parsing the
    xml for …..” with some details. It was written before good feed parsing
    libraries like Rome existed.
  • Caterpillar mixes the entries from all the feeds together into one long
    list. You can use the pull down to select entries from a particular feed if
    you like though. This is because Caterpillar was written to help me weed
    through the entries in the 300+ feeds I read. I don’t have time to read ALL
    the posts and clicking on every one of three hundred feed titles, the way
    I’d have to in most readers would take a
    really long time. Plus I don’t care
    which feed the “interesting” posts come from. I just care that they’re
    interesting. Don’t you?
  • Caterpillar doesn’t support auto-discovery. This means you have to tell in
    the url of the actual feed not just give it the web page’s URL and hope it
    can find it.
  • Caterpillar can take up a lot of
    memory.
  • It’s been over two years since I last did any real work on Caterpillar, and
    it was never intended to grow to the point that it has. From a code
    standpoint there is much room for improvement, and a number of the libraries
    could do with an update.
  • Caterpillar uses Aspirin when you want to send e-mails, which means you
    don’t have to configure anything. Unfortunately, it also means you’ll be
    sending e-mail form your box and you would be surprised at the number of
    ISPs who just block any e-mail coming from a dynamic IP address because they
    assume it must be spam. And no,
    Caterpillar never sends e-mails unless you tell it to.
  • The import OPML import could obviously use some improvement, and I think the
    export still uses it’s old internal format that can’t be read by other apps.
  • It doesn’t whistle any more. It used to have this nice, non-annoying, train
    whistle that would sound when it found new entries. There’s even a “Make it
    whisle” menu item that just made it whistle. It no longer whistles. This
    makes me sad. :(
    Correction: It DOES whistle it just appears that Ubuntu’s
    love/hate relationship with my sound card chose yesterday to rear it’s head again.
  • 10.7 Mb seems a bit much for such a small app…
  • Sometimes the search functionality looses it’s brain and neets to be reset.



Why release it now?

Or, more to the point, why release it in an unfinished state? Well, I had
intended to finish polishing it up and release it as a commercial product. But,
that was over two years ago and there have just been too many other projects on
my plate that are more important to me. I’d rather see people get some use out
of it than have it continue to sit on my computer benefiting no-one but me.
I’m also hoping that some smart programmer at
Google will see the value of positive Bayesian filtering and apply it to Google
Reader and Gmail.
Just imagine how awesome it would be if all those
mailing lists you subscribe to had a filter looking for “interesting” posts for
you so that you didn’t have to read everything or feel so overloaded that you
end up reading nothing.



Wanna help?

If you’re a Java geek feel free to download the source
with Darcs from
http://caterpillar.masukomi.org/code/caterpillar3

Tweak it however you want, add whatever feature you want, use the send feature
of Darcs to send me a patch file (masukomi at masukomi dot org) and I’ll
probably add it in. I figure any forward motion in Caterpillar is good at this
point. The only restriction being it needs to have a unit test with it. Yes, I
know, it seems hypocritical in light of the utter lack of tests in Caterpillar’s
source but since I wrote it I got the full-on testing religion so… deal :P .
To build Caterpillar just switch into the build directory and run ant.



Bugs & Feature requests



Report a bug.Report
a Caterpillar bug


Suggest a feature.Request
a Caterpillar feature




License & Copyright

The Caterpillar feed aggregator version 3.0 is copyright 2007 Kate Rhodes
(masukomi at masukomi dot org) and is released under the
GPL v2.0.
Have fun. Don’t blow anything up. Convince your rich company that they should
buy the source from me so that they can sell it under any license they want. Or
hire me and pay me a
decent salary. Either / or…

Popularity: 3% [?]

 
 

Read or Die May 28, 2007

Filed under: Uncategorized — masukomi @ 2:35 pm

In this industry stagnation == death.

If you’re not staying on top if new technology you may as well quit because your
skills will become obsolete and your job
with them. Feed aggregators are a godsend for people like us. There’s no way I
could keep up with
my
310 (and counting) subscriptions
if I had to go to their sites manually.
Now, to be honest, a fair number of them are purely entertainment, and some just
help me keep up with my friends lives. But, the vast majority are programming
related. I even went so far as to write my own feed reader with Bayesian
filtering to highlight the “interesting” posts for me, something I’d LOVE to see
in a commercial product. It makes SUCH a difference, and unlike spam, there are
ways of doing positive filtering that never involve having to specifically tell
the application what you do and don’t like.



All of which, is a really long winded way of putting up a link to what I’m
currently reading for those who actually care. Although at 310 subscriptions I
think the list has probably exceeded anything resembling a size that’s useful to
others. You can’t import it because you’ll flood your feed reader with crap you
don’t care about and 770 lines of XML is a bit much to try and read…

Popularity: 3% [?]

 
 

Math is for people who aren’t content with the status quo.

Filed under: Uncategorized — masukomi @ 1:57 am

Update: I just came across a similar post by raganwald wherein he
discusses the need for advanced
programming
skills
…which
you don’t get without math. ;)

When I was in high school no-one ever convinced me of why math was
important and that is my biggest educational regret. Children, and adults for
that matter, will neither seek out, nor retain, knowledge they don’t value. It’s
all well and good to tell them algebra is important but unless you show them WHY
algebra is important they will have no reason to retain it. I was one of those
kids who grasped geometry without problem, because the practical application of
it was inherent in it’s teaching. Algebra, on the other hand, was a series of
essentially random numbers written on a board. They taught you
how but never really dwelled on
why. I left high school honestly
believing that I’d never need it because no-one ever gave me reason to believe
otherwise.



Sixteen years later I know why. I’ve known
why for a while now, but as I never
really grasped, or retained, how I’ve
been stymied in my forward progress. You see, when it comes to programming? All
the cool shit requires math. Oh yeah, you can build nifty, and useful, apps
without any notable math skills. You’ll be able to rework old ideas in new ways.
It’s a lot like being a carpenter without any electrical engineering skills.
Sure you can use the belt sander, and you may have some good ideas on how to
make a better belt sander, but without a good understanding of electrical
engineering you’ll never be able to implement those ideas and actually
make that better belt sander.



You may be happy being a simple carpenter, but I’ve been nailing boards together
long enough to have some ideas on how to do it better and I’ve never been a
complacent person. I’m hunting down math skills and eating them whole.



Tonight’s lesson? Untyped lambda calculus. You need to learn it too? Start with
the
Alligator
Eggs
game. It’s a sneaky way to introduce it to young children, although, as
it mentions near the bottom, it could use some expanding upon for further
clarity. Then read this
introduction
to lambda calculus (pdf)
. If some of the things in the paper are missing the
why aspect try and match what they’re
saying with the rules you learned in Alligator Eggs. I still don’t know
why the alligators die but at least now I
know how, when, and where to kill them.



And, to any youngsters who may be reading this and wondering
why I actually
need lambda calculus, the answer is
“functional” programming languages like Haskell and Erlang. You may be able to
go through the motions, and follow the rules of functional programming but
you’ll never really understand the why of it if you don’t understand lambda
calculus.




Popularity: 3% [?]

 
 

I think I know why people don’t value tests May 25, 2007

Filed under: Uncategorized — masukomi @ 8:52 pm

I think I understand why people tend to not write tests. Because they believe
that tests aren’t something that’s either needed or important.

 

“Duh,” I hear you say, but bear with me.



Why don’t people believe that tests are something that’s either needed or
important? Well, I think one of the biggest contributing factors to WHY is that
essentially zero of the learn to program in language FOO books ever mention unit
testing. Unit testing has been around in a formal sense since
the
creation of SUnit back in 1994
! 1994 I say!
That’s thirteen years now. Thirteen years and I could probably count on
one hand the number of introductory language, or language reference, books that
not only mention unit tests but actually explain why their important and how to
use them. Even worse, most languages don’t have unit tests tools built into
their core libraries. All the modern languages have fairly comprehensive test
coverage but they have to use external tools to write those tests. How crazy is
that? We have this common programming task that we all agree is
critical to releasing a stable version of
the language but it’s not important enough to build
into the language.  Wha?!?! 
The end result is that since we don’t teach tests as being even noteworthy when
teaching a language no-one learns that they are important. For the most part
people just don’t seem to understand the value of tests until they’ve been in
the industry so long that their feet are riddled with
holes.[1]



Mike Clark, and others, suggest
writing
“Learning Tests”
as a way, not only to learn a new language, but as a way to
accrete a repository of what you’ve learned about a language. I think this is a
GREAT idea. Imagine if every book that taught a new programming language showed
you not only how to do something but then followed it up with how to confirm
that you didn’t screw it up by demonstrating how to write a test for it? People
would start to see test writing as a standard part of the software writing
process. It would be “just what you do.”



Imagine the impact that including unit testing as a standard part of the
learning process would have on the software industry! Sure it might take five to
ten years before we started to see the results from it but wouldn’t it be worth
the wait?



[1] From having shot themselves in the foot on many prior
occasions.

Popularity: 3% [?]

 
 

Why you should endeavor to hire from startups

Filed under: Uncategorized — masukomi @ 5:49 pm

I just had a thought. Companies looking for new developers should try to only ever hire from startups and similar small team companies. Why? Becasuse people who work for small startups can’t hide. You can be reasonably certain that someone who has managed to survive for more than a few months in a small dev team puts out decent code at a good pace. Small companies just can’t afford to keep crap coders or non-producers on their payrolls. Such a simple filter. And, I think most of the time it’s an even better filter than that. Small co people tend to be more flexible and more self-motivated in my experience. I’d never really thought about that before…

Popularity: 3% [?]

 
 

A tumblelog too

Filed under: Uncategorized — masukomi @ 3:47 pm

I come across a lot of interesting quotes in the feeds I’m constantly reading. So, I’ve grabbed an account on Tumblr and put up a new tumblelog at http://masukomi.tumblr.com for those. So add it’s feed to your reader if you find the same kind of things interesting that I find interesting.

Popularity: 3% [?]

 
 

DHH on passion and tools.

Filed under: Uncategorized — masukomi @ 4:47 am

I think we’re seeing the same thing with Ruby on Rails as we were in some sense seing with Agile and with XP. People leave companies to work in more passionate environments. And, if they were forced to work in waterfall development processes and practices they would leave that shop to chase an XP shop or Agile evnironment. And, I’m seeing the same kind of thing. A lot of people come up to me and say “I quit my job because of Java…I quit my job because of .Net. I simply did not want to work in an environment where we used those tools anymore. I’m not passionate about those tools so I chose to take a minor position somewhere else doing Ruby on Rails. I chose to go freelance doing Ruby on Rails. And I’m much happier now.” - DHH

Popularity: 3% [?]

 
 

The power of tests…

Filed under: Uncategorized — masukomi @ 4:20 am

If you look at Mingle, the project management tool we’ve been working on… On that tool I happen to know that their test base is twice as much as their code base. So, two-thirds of the code in that product is tests, and that allows them to do quite violent things. I know that a couple of months ago they made a very fundamental change to the database scheme. I mean, we’re talking, utterly to the guts of the database scheme. And they did that and… it wasn’t even an event worth talking about. And, when they were planning to do it they were saying ‘Yeah, yeah, we’ve got to fundamentally alter the core tables in this application… Yeah we’ll do that, and it’s not a big deal.’”
-Martin Fowler (paraphrased)

Popularity: 3% [?]

 
 

Unit testing your JavaScript May 24, 2007

Filed under: Uncategorized — masukomi @ 4:44 pm

Most web developers will agree that unit tests are great, and some even write
them…but I know very few developers who write unit tests for their JavaScript,
but it’s not really their fault. Most don’t know of good unit test systems for
JavaScript and / or don’t write their JavaScript in such a way that you even
could test well. This means breaking all the functionality into discreet
functions and objects instead of writing old-school procedural crap. 
There’s also the obvious problem that most of your JavaScript is tied to the
browser and the current page. So how do you test stuff in the page? Well,
JsUnit lets you do just that
and, seeing as I’ve just added a javascript implementation to the
FizzBuzz
Overthink
you can run over and see how to do it for your apps too.



The limitation of JsUnit is that it really wants a complete file to load into
the test harness but it’s bean ages since I’ve worked on a site where at least
some portion of every page was dynamically loaded. Fortunately the workaround is
fairly simple: grab the source to the JsUnit Test Runner ( testRunner.html ) and
extract the form but leave out the the file chooser. Put all that into a
template file that will be dynamically loaded into your app when you view the
page as a specific user or with a magic cookie. Otherwise it’s just not there.
Now every page that needs a JavaScript unit test will have a test harness down
at the bottom. Click the button and see if it passes. In order to run
all the tests on
all the pages just have your
Selenium
tests also check that the unit tests run correctly. Usually you’ll want to have
this be a separate test run so that you can log in as the user who gets to see
the tests and because you’ll frequently want to populate your forms and such
with data that WILL break things just to test that your JavaScript doesn’t blow
up in people’s faces.


Popularity: 4% [?]

 
 

Getting some agility in your workplace. A flow chart.

Filed under: Uncategorized — masukomi @ 12:23 am

As I mentioned in
my
recent rant
our industry is plagued with bad practices even though so many
of us know better. A HUGE portion of this problem is that to really start, and
continue, working the way we know we should requires buy-in from our managers
and coworkers. And it’s not just a conceptual buy in that we need. People need
to really get the religion.  But,  you and I both know that we can’t
realistically expect the rest of the company to change
everything at once. So this flow chart
addresses what I think are the three biggest changes we can make that will
dramatically improve our companies.  Find your biggest pain point and work
on that. Once you’ve got that working really well in your group and you’re
confident that people are going to stick with it you can choose the next biggest
pain point and address that one, remembering what things worked best to get your
employees to buy into the last change.



Your feedback on this one is incredibly
valuable to me on this one. Even if it’s just one sentence saying what you do or
don’t agree with. It’s really important to me for two reasons. 1) It’ll play a
big part in my book. 2) Many of my current coworkers are frustrated and / or
depressed about our current state of affairs and I’d like to try and put the
advice in this chart into practice. Yes, there are some areas that could use
more detail, but I’d really like to hear from you about what pieces you feel
would be most helpful to expand upon.



I’m afraid that this chart may just be too high level. Each of the major pain
points could easily get their own graph for implementing solutions at a similar
level of complexity and I think I will make those too… just not tonight.



As always you can click on the image for a full sized version and / or
download
the Dia file
too.








Creative Commons License


This
work is
licensed under a
Creative
Commons Attribution-Share Alike 3.0 United States License
.


Popularity: 3% [?]