Thursday, September 5, 2019

Adding Webseed URLs to Torrent Files

I was recently hanging out on a Slack discussing the deficiencies in the BitTorrent protocol for fast file distribution. A decade ago when Linux mirrors tended to melt down on release day, Bittorrent was seen as a boon for being able to distribute the relatively large ISO files to everyone trying to get it, and the peer-to-peer nature of the protocol meant that the swarm tended to scale with the popularity of the torrent, kind of by definition.

There were a few important points raised during this discussion (helped by the fact that one of the participants had actually presented a paper on the topic):

  1. HTTP-based content distribution networks have gotten VASTLY better in the last decade, so you tend not to see servers hugged to death anymore when the admins are expecting a lot of traffic.
  2. Users tend to see slower downloads from the Bittorrent swarm than they do from single healthy HTTP servers, with a very wide deviation as a function of the countless knobs exposed to the user in Bittorrent clients.
  3. Maintaining Bittorrent seedbox infrastructure in addition to the existing HTTP infrastructure is additional administrative overhead for the content creators, which tends to not be leveraged as well as the HTTP infrastructure for several reasons, including Bittorrent's hesitancy to really scale up traffic, its far from optimal access patterns across storage, the plethora of abstract knobs which seem to have a large impact on the utilization of seedboxes, etc.
  4. The torrent trackers are still a central point of failure for distribution, and now the content creator is having to deal with a ton of requests against a stateful database instead of just serving read-only files from a cluster of HTTP servers which can trivially scale horizontally.
  5. Torrent files are often treated as second class citizens since they aren't as user-friendly as an HTTP link, and may only be generated as part of releases to quiet the "hippies" who still think that Bittorrent is relevant in the age of big gun CDNs.
  6. Torrent availability might be poor at the beginning and end of a torrent's life cycle, since seedboxes tend to limit how many torrents they're actively seeding. When a Linux distro drops fifteen different spins of their release, their seedbox will tend to only seed a few of them at a time and you'll see completely dead torrents several hours if not days into the release cycle. 
As any good nerd discussion on Slack goes, we started digging into the finer details of the Bittorrent specification like the Distributed Hash Table that helped reduce the dependence on the central tracker, peer selection algorithms and their tradeoffs, and finally the concept of webseed.

Webseed is a pretty interesting concept which was a late addition to Bittorrent where you could include URLs to HTTP servers serving the torrent contents, to hopefully give you most of the benefits of both protocols; the modern bandwidth scalability of HTTP, and the distributed fault tolerance and inherent scaling of Bittorrent as a function of popularity.

I was aware of webseed, but haven't seen it actually used in years, so I decided to dig into it and see what I could learn about it and how it fits into the torrent file structure.

The torrent file, which is the small description database which you use to start downloading all of the actual content of a torrent, at the very least contains a list of the files in the torrent and checksums for each of the fixed-size chunks making up those files. Of course, instead of using a popular object serializer like XML or JSON (which I appreciate might not have really been as popular at the inception of Bittorrent), the torrent file uses a format I've never seen anywhere else called BEncoding.

The BEncoding format is relatively simple; key-value pairs can be stored as byte strings or integers, and the file format supports dictionaries and lists, which can contain sets of further byte strings, integers, or even other lists/dictionaries. Bittorrent then uses this BEncoding format to create a dictionary named "info" which contains a list of the file names and chunk hashes which define the identity of a torrent swarm, but beyond this one dictionary in the file, you can modify anything else in the database without changing the identity of the swarm, including which tracker to use as "announce" byte-strings, or "announce-list" lists of byte-strings, comments, creation dates, etc.

Fortunately, the BEncoding format is relatively human readable, since length fields are encoded as ASCII integers, field delimiters are characters like ':', 'l', and 'i', but unfortunately this is all encoded as a single line with no breaks, so trying to edit this database by hand with a text editor might be a little hairy.
I wasn't able to find a tremendous amount of tooling for interactively editing BEncode files; there exists a few online "torrent editors" which give you basic access to changing some of the fields which aren't part of the info dictionary, but none of them seemed to give the arbitrary key-value editing capabilities I needed to play with webseed, so I settled on a Windows tool called BEncode Editor. The nice thing about this tool is that it's designed as an arbitrary BEncode editor, instead of specifically a torrent editor, so it has that authentic "no training wheels included" hacker feel to it. User beware. 

As an example, I grabbed the torrent file for the eXoDOS v4 collection, which is a huge collection of 7000 DOS games with various builds of DOSBOX to make it all work on a modern system. Opening the torrent file in BEncode Editor, you can see the main info dictionary at the end of the main root dictionary, which is the part you don't want to touch since the info dictionary is what defines the identity of the torrent. In addition to that, you can see five other elements in the root dictionary, including a 43 byte byte string named "announce" which is a URI to a primary tracker to use to announce yourself to the rest of the swarm, a list of 20 elements named "announce-list" which is alternative trackers to use (the file likely contains both the single tracker and a list of trackers for backwards compatibility for Bittorrent clients which predate the concept of announce-lists?) and some byte strings labeled "comment", "created by", and an integer named "creation date", which looks like a Unix timestamp.

Cool! So at this point, we have an interactive tool to inspect and modify a BEncode database, and know which parts to not touch to avoid breaking things (The "info" dictionary).

Now back to the original point of somehow adding webseed URLs to a torrent file


Webseeding is defined in Bittorrent specification BEP_0019, which I didn't find particularly clear, but the main takeaway for me is that to enable webseeding, I just need to add a list to the torrent named "url-list", and then add byte-string elements to that list which are URLs to HTTP/FTP servers serving the same contents.

So first step, log into one of my web servers and download the torrent and throw the contents in an open directory. (In my case, https://mirror.thelifeofkenneth.com/lib/) For actual content creators, this should be part of their normal release workflow for HTTP hosting of the content, so this is only really needed for when you're retrofitting webseed into an existing torrent.
Now we start editing the torrent file, by adding a "url-list" list to the root dictionary, and the part I found a little tricky was figuring out how to add the byte-string child to the list, which is done in BEncode Editor by clicking on the empty "url-list" list, and clicking "add" and specifying that the new element should be added as a "child" of the current element.
Referring back to BEP_0019, if I end the URL with a forward slash, the client should append the info['name'] to the URL, so the binary string I'm adding as a child to the list is "https://mirror.thelifeofkenneth.com/lib/" such that the client will append "eXoDOS" to it, looking for the content at "https://mirror.thelifeofkenneth.com/lib/eXoDOS/", which is correct.

Save this file as a new .torrent file, and success! Now I have a version of the eXoDOS torrent with the swarm performance supplemented by my own HTTP server! The same could be done for any other torrent where the exact same content is available via HTTP, and honestly I'm a little surprised that I don't tend to see Linux distros using this, since it reasonably removes the need for them to commit to maintaining torrent infrastructure since the torrent swarm can at least survive off of an HTTP server, which the content creator is clearly already running.

Monday, July 22, 2019

Building Your Own Bluetooth Speaker

Video:


I recently found a nice looking unpowered speaker at a thrift shop, so I decided to turn it into a bluetooth speaker so I could use it in my apartment with my phone. The parts list is pretty short:

Wednesday, June 5, 2019

Using 0603 Surface Mount Components for Prototyping

 As a quick little tip, when I'm prototyping circuits on 0.1" perf board, I like using 0603 surface mount components for all of my passives and LEDs, since they nicely fit between the pads. This way I don't need to bother with any wiring between LEDs and their resistors, since I can just use three pads in a row to mount the LED and their corresponding current limiting resistor and just need to wire the two ends to where they're going.
I also like using it as a way to put 0.1uF capacitors between adjacent pins on headers, so filtering pins doesn't take any additional space or wiring.

The fact that this works makes sense, since "0603" means that the surface mount chips are 06 "hundredths of an inch" by 03 "hundredths of an inch", so they're 0.06" long, which fits nicely between 0.1" spaced pads.

I definitely don't regret buying an 0603 SMT passives book kit, which covers most of the mix of resistors and capacitors I need. I then buy a spool of anything that I manage to fully use up since it's obviously popular, so I eventually bought a spool of 0.1uF capacitors and 330 ohm resistors to restock my book when those strips ran out.

Monday, May 20, 2019

Twitter-Connected Camera for Maker Faire

This last weekend was Maker Faire Bay Area, which is easily one of my favorite events of the year. Part county fair, part show and tell, and a big part getting to see all my Internet friends in one place for a weekend.

This year, I got struck with inspiration a few weeks before the event to build a camera that could immediately post every picture taken with it to Twitter. I don't like trying to live tweet events like that since I find it too distracting taking the photo, opening it in Twitter, adding text and hashtags, and then posting it. This camera would instead upload every picture with a configurable canned message, so every picture will be hashtagged, but I have a good excuse for not putting any effort into a caption for each picture: "this thing doesn't even have a keyboard"

I went into this project with the following requirements:

  • Simple and fast to use. Point, click, continue enjoying the Faire.
  • Robust delivery of tweets to Twitter tolerant of regularly losing Internet for extended periods of time when I walk inside buildings at Maker Faire. The camera would be tethered off my cell phone, and while cell service at Maker Faire has gotten pretty good, the 2.4GHz ISM band usually gets trashed in the main Zone 2 hall, so the camera will definitely be losing Internet
  • A very satisfying clicky shutter button.
  • A somewhat silly large size for the enclosure, to make the camera noticable and more entertaining.
  • Stretch goal: A toggle switch to put the camera in "timer delay" mode so I could place it somewhere, press the shutter button, and run into position.

The most interesting challenge for me was coming up with a robust way for the camera to be able to take one or multiple photos without Internet connectivity, and then when it regains Internet ensure that all of the pictures got delivered to Twitter. It would be possible to code some kind of job queue to add new photos to a list of pending tweets and retry until the tweet is successfully posted, but I only had a few weeks to build this whole thing, so chasing all the edge cases of coding a message queue that could even tolerate the camera getting power cycled sounded like a lot of work.

I eventually realized that guaranteed message delivery for a few MBs of files while experiencing intermittent Internet is already actually a very well solved problem; email! This inspiration got me to where the camera will be running a local Python script to watch the shutter button and capture the photo, but instead of trying to get the picture to Twitter, instead attach it to an email and send it to a local mail server (i.e. Postfix) running on the Raspberry Pi. The mail server can then grapple with spotty Internet and getting restarted while trying to deliver the emails to some API gateway service to get the photos on Twitter, while the original Python script is immediately ready to take another photo and add it to the queue regardless of being online.

YouTube Video:


The Python script running on the Raspberry Pi is available on GitHub.

The camera was really successful at Maker Faire! I enjoyed being able to snap photos of everything without needing to pull out my phone, and really loved the looks of amusement when people finally figured out what I was doing with a colorful box covered in stickers with a blinking LED on the front.

I ended up taking several HUNDRED photos this weekend (the good ones are available to browse here). Unfortunately, this means I very quickly found out that IFTTT has a 100 tweet per day API limit, which they don't tell you about until you hit it. Talking to others, this is apparently a common complaint about IFTTT where it's great to prototype with, but as soon as you try and actually demo it, you hit some surprise limit and they cut you off. If I had known they were going to cut me off, I would have sat down and written my own email to Twitter API gateway, but by the time I realized this problem Friday of Maker Faire, sitting down to try and write an email parser and using Twitter's API directly wasn't an option anymore. GRRR. Two out of Five stars, would not recommend IFTTT.

When I opened a ticket with IFTTT support, they said I was out of luck on the 100 tweet/day limit, and suggested I went and got a Buffer account, which is a paid social media queue management service, so my per day tweet limit would be higher, but we've now got one more moving part in that the photos are going Raspberry Pi - my mail server - IFTTT - Buffer - Twitter. UNFORTUNATELY, Something wasn't happy between IFTTT and Buffer, so only about 20% of the API calls to add my photos to my Buffer queue were successful. Buffer also sucked for what I was trying to do because it's more meant for uploading a weeks worth of content as a CSV to post three times a day on social media. To get Buffer to post all of my photos, I had to manually go in and schedule it to post the next photo in my queue every five minutes... all day... so I was sitting there on my phone selecting the next five minute increment and clicking "add to schedule" for quite a while. 1/5 stars, will definitely never use again.

So the irony here is that the photo delivery from the Raspberry Pi in the field back to my mail server was rock solid all weekend, but getting it from my mail server to Twitter fell on its face pretty hard.

The other notable problem that I ran into while stress testing the camera the week prior to Maker Faire was the fact that Postfix was getting its DNS settings when it started, and seemed to not expect the server to be roaming between various WiFi networks, so I needed to edit the /etc/dhcpcd.conf file to force the Raspberry Pi (and thus also Postfix) to just use some public DNS resolvers like 8.8.8.8 and 1.1.1.1 instead of trying to use my Phone's DNS resolver, which obviously wasn't available when my camera roamed to other WiFi networks like the one at Supply Frame's office.
I also spent all weekend adding stickers to the camera's box, so by the end of the weekend the cardboard box was beautifully decorated.


Material used for this build:

Starting from a clean Raspbian Lite image, run raspi-config and
  • Set a new user password and enable ssh
  • Set the locale and keyboard map to suit your needs
  • Set the WiFi country and configure your WiFi network credentials
Install the needed software and my prefered utilities
sudo apt update
sudo apt install postfix mutt vim dstat screen git

Starting with Postfix, we need it to be able to relay email out via a remote smarthost, since pretty much no consumer Internet connection allows outgoing connections on port 25 to send email. It's possible to use Gmail as your smarthost, so feel free to search for guides on how to specifically do that, but I just used a mail relay I have running for another project.

To do this, I first created a new file /etc/postfix/relay_passwd and added one line to it:
smtp.example.com USERNAME:PASSWORD

This file gets compiled into a database (relay_passwd.db) that postfix uses to look up your username and password when it needs to log into your relay. Conceivably you could have multiple sets of credentials for different hosts in here, but I've only ever needed one for my relay.

I then changed the permissions on it so only root can read it, and generated the database file postfix actually uses to perform lookups against this host to username/password mapping.

chmod 600 /etc/postfix/relay_passwd
postmap /etc/postfix/relay_passwd

To configure Postfix to use this relay, I added these lines to my /etc/postfix/main.cf file:
relayhost = [smtp.example.com]:587
smtp_use_tls=yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/relay_passwd
smtp_sasl_security_options =

At this point, you should be able to use a program like mutt or sendmail on the Raspberry Pi to send an email, and watch the respective /var/log/mail.log files to see the email flow out to the Internet or get stuck somewhere.

Tweaking Postfix defer queue behavior - Since using Postfix as a message queue on a portable Raspberry Pi is a bit unusual compared to the typical Postfix application, which probably involves a datacenter, we expect the Internet connection to be quite a bit more flaky, so retries on messages should happen a lot more often than on typical mail delivery. I changed it to start retrying emails at 100 seconds and then Postfix doubles that up to the maximum of 15 minutes, which seems like a reasonable upper limit for the time my phone would go offline and then find signal again.

queue_run_delay = 30s (300s default)
minimal_backoff_time = 30s (300s default)
maximal_backoff_time = 300s (4000s default)

The actual camera script is available on GitHub, including the modified dhcpcd config file, a systemd service to start the Python script on boot, and an example copy of the configuration file that the tweetcam script looks for at /etc/tweetcam.conf which includes where to send the picture emails and what to put in the subject and body of the emails.


The hardware was a small square of perf board to act as an interface between the Raspberry Pi 40 pin header and the buttons / LEDs mounted in the box. For the inputs, the Pi's internal pull-up resistors were used so the button just needed to pull the GPIO pin to ground. A 0.1uF capacitor was included for debounce, which ended up not really mattering since it took a little more than a second for the python script to capture the photo and send the email.

I put an NPN transistor buffer on both of the LEDs using a 2N3904 transistor, which I probably could have done without, but I hadn't decided how big, bright, or how many LEDs I wanted to drive when I soldered the perf board.

The camera module was plugged into the camera CSI port on the Pi.

In the end, this project was a huge hit and I particularly appreciated the nice tweets I was getting from people who couldn't make it to Maker Faire and were enjoying getting to see ~500 photos of it tweeted live from me walking around. I suspect I'll end up continuing work on this project and try and fix some of the issues I encountered like the IFTTT Twitter API limits and how really underwhelming the image quality is coming out of the OEM Raspberry Pi Foundation camera module... 

Saturday, May 11, 2019

FCIX - Lighting Our First Dark Fiber

 The Fremont Cabal Internet Exchange project continues!

The first eight months of FCIX was spent spinning up the exchange in the Hurricane Electric FMT2 building running out of my personal cabinet there. At the end of 2018, in the typical 51% mostly kidding style that is starting to become an integral part of the FCIX brand, we asked our data center host Hurricane Electric if they would be willing to sponsor us for a second cabinet in their other facility (HE FMT1) and a pair of dark fiber between the two buildings.

This felt like a pretty big ask for a few reasons:

  1. Hurricane Electric has already been a very significant sponsor in enabling FCIX to happen by their sponsoring of free cross connects for new members into the exchange so joining the exchange is a $0 invoice. 
  2. Asking for a whole free cabinet feels pretty big regardless of who you're talking to.
  3. Hurricane Electric is in the layer 2 transport business, so us asking to become a multi-site IXP puts us in the somewhat tricky position of walking the line of being in the same business as one of our sponsors while using their donated services. 
That last point is an interesting and possibly somewhat subtle one. It's also not an unusual awkward position that Internet Exchange Points are put in; when an Internet Exchange Point goes multi-site, their multiple sites are often connected by donated fiber or transport from a sponsor who's already in that business. This means that once an IXP starts connecting networks between the two facilities over the donated link, pairs of networks which might have already been leasing capacity between the two buildings might be motivated to drop their existing connection and move their traffic onto the exchange's link. Some IXPs will also enforce a policy that single networks can only connect to the IXP in one location, so members can't use the fiber donated to the IXP as internal backhaul for their own network, since that's a service they should be buying from the sponsor themselves, not enjoying the benefit of it being donated to the IXP.

Do I think this is a major concern between HE FMT1 and HE FMT2 for FCIX? No. These two buildings are about 3km apart, and Hurricane Electric had made sure there is a generous bundle of fiber between the two buildings, so it is unlikely that HE is making a lot of money on transport between two buildings within walking distance.
So we asked, and Hurricane Electric said yes.

At that point, we had an empty cabinet in FMT1, and a pair of dark fiber between the two buildings allocated for us, but a second site means we need a second peering switch...

"Hey, Arista..."

Arista was kind enough to give us a second 7050S-64 switch to load into FMT1, so we now have a matching pair of Ethernet switches to run FCIX on. Cool.

The final design challenge was lighting this very long piece of glass between the two sites. Thankfully, 3km is a relatively "short" run in the realm of single mode fiber, so the design challenge of moving bits that far isn't too great; pretty much every single generation of fiber optic has off-the-shelf "long-reach" optics which are nominally rated for 10km of fiber, so we weren't going to need to get any kind of special long range fiber transponders or amplifiers to light the pair.

In reality, they aren't really rated for 10km, so much as they're rated for a certain signal budget that usually works out well enough for 10km long links. For an example, lets take the Flexoptix 10G LR optic, which has the following technical specifications:
The important numbers to focus on in this table are on the top right:

  • Powerbudget (db): 6.2db
  • Minimum transmit power: -8.2dbm (db over a milliwatt)
  • Minimum receive power: -14.4dbm
The powerbudget is the acceptable amount of light that can be lost from end to end over the link, be it through linear km of fiber, connections, splices, attenuators, etc. So the 10km "distance" parameter is more a rule of thumb statement that a 10km link will typically have 6.2db of attenuation along it than the optic really being able to tell exactly how far it is from the other end of the fiber. 

The minimum transmit power and minimum receive power are actually related to the powerbudget by the fact that the power budget is the difference between these two numbers. Usually, your powerbudget will be much better than this to begin with, because most optics will put out much more than -8.2dbm of power when they're new, but some of them might be that low, and the lasers will actually "cool" over their life-span, so even if an optic comes out of the box putting out -2dbm of light, as it ages that number will go down. 

When it comes to serious long fiber links, it's likely you'll want to get a signal analysis done of the actual fiber you plan on using, and then using that information to plan your amplifiers accordingly. We, on the other hand, very carefully looked at how far the two buildings were on Google Maps, stroked our chins knowingly like we knew what the hell we were doing, and decided that a 10km optic would probably be good enough. Time to call another FCIX sponsor.

"Hey, Flexoptix..."

Given that we wanted to use 10km optics, we really had three choices for optics, given what our 7050S-64 switches could support: We could light it with a pair of 1G-LX optics, which would be pretty lame in this day and age, or we could light it with a pair of 10G-LR optics, which would probably be pretty reasonable given the amount of traffic we expect to be moving to/from this FMT1 extension, OR, we could ask Flexoptix for a pair of $400 40G LR4 optics and use some of those QSFP+ ports on our Aristas... because why not? faster is better.
So that's what we did. FCIX now has a 40G backbone between our two sites. 

40G LR4 is actually a little mind blowing with how they get 40G across a single pair of fibers, because a single 40G transceiver isn't how they actually did it. 40G was really an extension of 10G by putting 4x10G in one optic, and there was two ways of then transporting these 4x10G streams to the other optic:
  1. PLR4, or "parallel" LR4, where you use an 8 fiber cable terminated with MPO connectors, so each 10G wavelength is on its own fiber.
  2. LR4, which uses the same duplex LC fiber as 10G-LR, but uses four different wavelengths for the four 10G transceivers, and then integrates a CWDM (coarse wave division multiplexing) mux IN THE FREAKING OPTIC.

It's not entirely correct, but imagine a prism taking four different wavelengths of light and combining them into the single fiber, then a second prism on the receiver splitting them back out to the four receivers. Every time I think about it, it still blows my mind how awesome all of these Ethernet transceivers are once you dig into how they work.

So we now have 40G between our two sites, and like always, it wouldn't have been possible without all of our generous FCIX sponsors; they're pretty cool. If you happen to be an autonomous system running in either of those facilities, or want to talk to us about extending FCIX to another facility in the Silicon Valley, feel free to shoot us an email at contact@fcix.net.