Wednesday, January 18, 2012

Posting Prowl Messages from BusyBox

Yesterday, I showed how to send Prowl notifications to your iOS device from a full featured unix shell.  Unfortunately, more and more devices are now running Linux, but lack the full featured shell needed to use curl or wget to create POST requests over SSL.  This is because many devices, such as WRT54GL routers, NASes, etc, don't really ever need a fully featured user space, so why waste flash storage on utility features which won't be used?  You will find most embedded Linux devices are based on a project called BusyBox.

BusyBox is a stripped down replacement for the traditional GNU user space found in most modern unix environments.  BusyBox implements all of the basic shell utilities such as rm, ls, touch, vi, awk, sed, et cetera, but you will find that most of these utilities are a far cry from their GNU equivalents, only implementing  the bare essentials to have a functional Linux system.  For example, GNU wget has more than 100 available options which you can use to change the behavior as to how wget downloads files or forms requests.  BusyBox wget has seven, and unfortunately, these seven available options don't happen to include the "--post-data" needed to have wget form the POST HTTP request we need to use the Prowl API.



So we've got a real problem.  Prowl's API works through HTTP POST commands (preferably over SSL, but no longer required), but BusyBox's wget lacks the "--post-data" option.  So what the heck do you do?

The obvious answer is to port GNU wget, curl, or even write your own utility to create HTTP POST requests and send them down the line.  HTTP is a fairly simple protocol, and this shouldn't be too big of a deal.  Heck, if you're nimble on the keyboard, you can type in HTTP requests by hand using netcat.

I didn't like that.  BusyBox gives us such a simple but powerful set of tools.  Why go through all the pain of porting something over to a MIPS processor when you can do it with the tools you already have?




So what are we trying to do with a HTTP POST command, really?

HTTP is the protocol used to deliver information over the web.  When you type in a URL (such as http://www.google.com), you are telling your web browser to generate a HTTP GET request and to send it off to whatever server is specified by the URL.  HTTP POST requests are very similar, but have extra information in them beyond simply the URL, most often used to pass information back to the web server after you fill out web forms.

But POST requests aren't the only way to pass information back to web servers.  We can use GET requests as well, by appending the information to return to the end of the URL after a question mark.  So, for example, we want to pass to the page http://api.prowlapp.com/publicapi/add our request which includes our apikey, application, event, and description.  This would end up as a HTTP request that would like like this:

http://api.prowlapp.com/publicapi/add?option=value&option=value&option=value

After the URL, a question mark begins a number of option=value pairs, which would include our apikey=XXXX, application=WRT54GL, etc.  This isn't as nice as POST requests, because we're more limited at to which characters we can use (which we'll get into later on in this post), and these long ugly URLs tend to lead to users accidentally leaking information more often than POST requests, but the brain dead wget in BusyBox MUST handle GET requests, or else it really wouldn't be a very useful utility at all.

So what's the big deal with building this special GET request with all our info?  Couple shell variables, string 'em all together using a ? and a couple &s, and you've got your URL, right?

URLs are very limited at to the characters they can use in a request.  This is why you'll often see your browser replace such characters as spaces in a URL with %20, which is the ASCII hex representation of " ", because the space otherwise looks like the end of the URL.  So once we generate our long string of option=value pairs, we need to encode them to avoid using any illegal symbols when forming the URL, such as spaces, new lines, %, ~, ^, etc.



This URL encoding can be almost completely accomplished just using a tool called sed, or "stream editor."  Sed operates by you feeding a string of text into it, give it a set of conditions where to replace one thing with another, and then it outputs this edited stream of text.

For example, to encode all the spaces in our request, we need to replace them with their ASCII hex value of 20.  This is done by giving sed the command s/ /%20/g, which says to substitute every instance of " " with %20 through all of every line.  Put together a long boring list of all of these needed sed commands, and we're 95% of the way there.


The one piece of the puzzle missing with using sed is that sed doesn't consider new lines.  Sed operates on a stream one line at a time, which it uses new lines to split the stream into, but it doesn't actually see the new lines traveling through the stream.  So after we convert all the other special characters into their hex equivalent, we need to somehow replace every new line with %0a.  To do this, we use a different tool called tr ("Translate"), which you can give one set of characters to replace with another.


For example, if you were to give tr a-z A-Z, it would substitute every lower case letter in the stream with its corresponding upper case letter.  But we don't need it to do that; we just need it to replace new lines with %0a...


Unfortunately, in a word, it can't.  Tr can't replace a single character with multiple characters...  If only we knew one that did... Hmmm...


What about sed?  We could have tr, which can see the new lines, replace all of the new lines with some special character which sed CAN see, and which is unlikely to already be in the stream, and then have sed replace each of these characters representing new lines with the needed %0a.  Luckily, we have a whole set of characters which we know aren't in the stream anymore, because we just had sed do all this work to encode spaces and tildes into their hex equivalents.


I arbitrarily picked tildes, but any special character would do.  Take the otherwise scrubbed stream, feed it through tr '\n' '~' to replace new lines with ~, and then feed that through sed again with the single command 's/~/%0a/g' and we have an entirely encoded URL, which we can pass on to wget to post to Prowl from our wifi router.


Pretty awesome, if I may say so myself.  In the example code below, I have my router generate a little report including it's CPU load average and file system usage, but you can replace those two lines feeding text into $CACHE with whatever you like, or even have the shell script take the message content as an argument through the shell.  Tomato (and likely most other third party firmwares) makes it real easy to trigger scripts on a regular basis, or even when someone presses the button on the front of it.


Don't forget to insert your API key at the top of the script.

Shell code:


Edit: I probably should have reread the RFCs and Wikipedia pages on HTTP before publishing this.  I believe I've now fixed most of my confusion between GET and POST requests.

5 comments:

  1. Isn't that actually a GET you're using? GET has the parameters encoded within a URL, following the ? and separated by & characters; POST has the parameters sent separately.

    ReplyDelete
  2. Actually, you've described an HTTP GET request, not a POST. A POST request encodes the data sent to the server into a special part of the request header (not in the URL), whereas a GET request encodes it into the URL parameter. As such, GET has more encoding issues than POST, and POST can carry more data.

    See http://thinkvitamin.com/code/the-definitive-guide-to-get-vs-post/ for some more details.

    ReplyDelete
  3. Well nerts. I agree. I really shouldn't be writing posts without access to Wikipedia, should I?

    So is there some name for the distinction between sending information as either a POST or GET vs receiving information?

    ReplyDelete
  4. Alllllright. I think I fixed it; does that look better?

    ReplyDelete