Have you ever been tasked to convert all instances of a URL into a hyperlink from some source of text? Or perhaps you had to pipe all URL instances through a URL shortener?
I've had to do this from time to time over the last several years (the first stab a it was a miserable piece of code but got the job done), and every time I have had to deal with the same quirks: what if the URL is at the end of a sentence, what if it is at the end of a line, what if it is immediately followed by an html page break, etc.
Dear god, what have I done to deserve this?
If any of this is ringing a bell and you feel like you're once again close to the 6th layer of hell (or maybe you like this kind of programming masiochism... I do). Or perhaps this looks vaguely familiar:
URI.extract is here to save your ass
Fear not, ruby is here to help you. I cannot believe I didn't stumble on this sooner. It turns out that the URI module includes a handy method called extract. You can use this to pull all URLs out of a string. For instance:
>> URI.extract(x)
=> ["http://news.ycombinator.com.", "http://xkcd.com"]
All very cool but that only gets us half way there. What if you want to process those URLs and put the results back into the string where you left them? I was hoping that URI.extract would accept a block and substitute the results into the original string... no joy.
However, if you dig into the URI.extract code you'll find that the URI module also sports a function that returns the regular expression used by URI.extract to parse URLs. We can use this regular expression in a gsub command to swap out URLs with the text of our liking.
>> x.gsub(URI.regexp) { |match| "<a href=\"#{match}\">#{match}</a>" }
=> "This is a fun site - <a href=\"http://news.ycombinator.com\">http://news.ycombinator.com</a> ... For lighter reading <a href=\"http://xkcd.com\">http://xkcd.com</a> should do the trick"
Breath deeply and (once again) thank the regular expression overlords
So that's all really cool and saves you a bunch of regex hackery that - fortunately - has been done before (over and over) and now lives in a library ready to serve. If you want to look more closely at this URI function, you can find the file in your ruby install here: