Can't Work All The Time

On Friday, at the request of Vas, I wrote a small and simple pattern matching engine to take a plain text reference and turn it into a BibTeX reference. It's a difficult task to do automatically, given that to interpret a plain text reference takes some human intelligence. My script takes simple formats like '%author%, %title% (%year%)', or you can extend it to match regular expressions, like '%/(\d+\. )%%author/.+?(?<!\b.)%. %title%.' Oh, and it does other things too; allows you to insert comments, and also lets you set the type.

It's not perfect; there appear to be some small glitches with certain regular expressions in certain places. But it's now able to successfully recognise over 60 plain text references with its current library of 20 formats, even with around 10 of those formats being pretty specific to certain examples Vas sent me. The script also has the ability to 'learn' - if you successfully match, you can add your format will be added to the library. And I don't think anyone would be lame enough to want to mess up the library on purpose.

It has also, at long last, given me an opportunity to put a tiny bit of my javascript out in the open. I've been writing some nifty javascript for several years, but it always seems to be either written for a specific need or event, or it ends up on admin-only interfaces for clients. This script has nothing special - just a Toggle class to display/hide objects, and turning some plain text into links to ensure graceful degradation for non-javascript browsing. But still, it makes a nice change to copy and pasting word documents, and generally makes the interface a bit niftier.

At the moment I find that my work is quite unrewarding from that point of view - most of the time I don't get a chance to do anything interesting, and when I do the only feedback I get is from the person I send it to. And nearly all of the time I'm the only person who ever sees and uses the behind-the-scenes admin interfaces and code libraries. This is the kind of thing I'm interested in and would like to spend more time working on, but I have too much work to get done first. Oh well, always the way, isn't it.

Anyway, I gave this plain text to BibTeX converter the imaginative name of text2bib, and put it online at services.uzeweb.com, which I took the opportunity to give a little facelift, having not touched it in around 4 years. Never enough time.

Now my eyes hurt from the halloween theme. Guess I should take it down now, but I don't want to :)

Comments

I'm guessing when you said 1st nov for TCMI you actually meant 1st dec because you believe christmas starts when advent does, unlike the shops who say it started a few weeks ago.

Um, yes, sorry, 1st December :p

Leave a comment

Next entry
  • 10 comments
Previous entry
  • 7 comments