Twitter Updates for 2008-07-02

July 2, 2008
  • If I ever find the telemarketer who keeps calling my mobile number to tell me about my expired warranty, I’m going to send flowers. #
  • Two things I can’t determine: 1) is Ghost in the Machine bigger than The Police? 2) how does Sagmeister do it? #
  • I may have more questions in the morning. #

That Itchy, Itchy Design Feeling

June 25, 2008

I’m feeling it. Time to explore and move forward, to go beyond what I’m currently capable of. To put clichés behind, as if they could be. To ask not what I can do for my country. Escape country.

Twitter Updates for 2008-06-14

June 14, 2008
  • Master Gardener tip: if you *must* water your lawn, do it before sunrise and after sunset, 20 minutes max each, every other day. #

Twitter Updates for 2008-06-11

June 11, 2008
  • Any PHP, Ruby, or Python developers interested in working with a current @TechStars incubator company in Boulder, Colorado? DM me. #
  • Just turned on the Stevenote. Two seconds into it I’m googling “Steve Jobs health” on Google News. #

Twitter Updates for 2008-06-10

June 10, 2008
  • Standing in a field of prairie dogs. Sounds like I’m in a squeaky toy factory. #
  • But if you want to experience the definition of cute you have to see a mound of baby prairie dogs up close. #

Google Sitemaps with Ruby on Rails, Capistrano, and Cron

This is a slight modification of code originally written by Alastair Brunton. I recently implemented this for Jetrecord and since Alastair was so generous, I decided to share the love as well. I have changed Alastair’s code to generate a sitemap index file plus sitemap files for each model, all of them gzipped to save on bandwidth.

I have also added Capistrano code to copy sitemap files from the previous release to the current release so we don’t lose our sitemap files when we deploy a new release.

Remember, Google sitemaps are for publicly available URLs. They’re for pages that you want Google to find and index. If you don’t want Google to find your CIA Operatives records, don’t tell Google about it!

Let’s just go straight to the code. I am going from the top down in my application’s root directory.

app/models/your_model.rb

You must add this code to each model that you want to generate a sitemap for. Here is an example for Airports on Jetrecord.

# put this inside app/models/airport.rb
def self.get_paths
  path_ar = []
  self.find(:all).each do |model|
    path_ar << {:url => "/airports/#{model.to_param}", :last_mod => model.updated_at.strftime('%Y-%m-%d')}
  end
  path_ar
end

config/sitemap/sitemap_tasks.rb

This is for Capistrano. You probably don’t have a config/sitemap directory. I created one and put my Capistrano sitemap task in it. This tells Capistrano, “After deploying my new release, copy the sitemap files from the previous release and store them in the same location in the current release.”

Capistrano::Configuration.instance(:must_exist).load do
  namespace :sitemap do
 
    desc "Copy the sitemap files after deploy"
    task :copy_sitemap, :roles => :app do
      puts "copying Rails sitemap files"
      sudo "cp #{previous_release}/public/sitemaps/* #{current_release}/public/sitemaps/"
    end
 
    after :deploy, 'sitemap:copy_sitemap'
  end
end

config/deploy.rb

This file usually contains your typical Capistrano recipes. All you have to do is require the sitemap_tasks file we created above.

# At the top of the file, after any other required files
require 'config/sitemap/sitemap_tasks'

lib/google_sitemap.rb

This is the meat of the whole thing. Kudos to Alastair for setting this up. The reason I modified it into using a sitemap index with sitemaps for each model is because Google allows a total of 50,000 links per sitemap. I have 48,000 navigation fixes, 20,000 airports, and 3,000 navaids in Jetrecord. By necessity I have to split my sitemap into many sitemaps.

I’m also gzipping the sitemap files because Google can read them and it saves bandwidth. Oh, and the URL to ping Google has changed, as has the XML namespace for their sitemap tags.

require 'net/http'
require 'uri'
 
# A class specific to the application which generates a google sitemap from the contents of the database.
# Author: Alastair Brunton
# Modified: Harry Love 2008-06-09
class GoogleSitemapGenerator
 
  def initialize(base_url, sources)
    @base_url = base_url
    @sources = sources
  end
 
  # 1. Iterate through each model's #get_paths method
  # 2. Create sitemap file for each model
  # 3. Create sitemap index file
  # 4. Ping Google
  def generate
    path_ar = []
    sitemaps = []
    @sources.each do |source|
      # initialize the class and call the get_paths method on it.
      path_ar = eval("#{source}.get_paths")
      xml = generate_sitemap(path_ar)
      save_file(source, xml)
    end
    index = generate_sitemap_index(@sources)
    save_file('index', index)
    update_google
  end
 
  # Create a sitemap document for a model
  def generate_sitemap(path_ar)
    xml_str = ""
    xml = Builder::XmlMarkup.new(:target => xml_str)
    xml.instruct!
    xml.urlset(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') {
      path_ar.each do |path|
        xml.url {
      	  xml.loc(@base_url + path[:url])
      	  xml.lastmod(path[:last_mod])
      	  xml.changefreq('weekly')
        }
      end
    }
    xml_str
  end
 
  # Create a sitemap index document
  def generate_sitemap_index(sitemaps)
    xml_str = ""
    xml = Builder::XmlMarkup.new(:target => xml_str)
    xml.instruct!
    xml.sitemapindex(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') {
      sitemaps.each do |site|
        xml.sitemap {
      	  xml.loc(@base_url + "/sitemaps/sitemap_#{site}.xml.gz")
      	  xml.lastmod(Time.now.strftime('%Y-%m-%d'))
   	}
      end
    }
    xml_str
  end
 
  # Save the xml file (gzipped) to disk
  def save_file(source, xml)
    File.open(RAILS_ROOT + "/public/sitemaps/sitemap_#{source}.xml.gz", 'w+') do |f|
      gz = Zlib::GzipWriter.new(f)
      gz.write xml
      gz.close
    end
  end
 
  # Notify Google of the new sitemap index file
  def update_google
    sitemap_uri = @base_url + '/sitemaps/sitemap_index.xml.gz'
    escaped_sitemap_uri = URI.escape(sitemap_uri)
    Net::HTTP.get('www.google.com', '/webmasters/tools/ping?sitemap=' + escaped_sitemap_uri)
  end
end

lib/tasks/sitemap.rake

This is the rake task that we’ll call periodically from Cron to generate new sitemap files.

require 'google_sitemap'
namespace :google_sitemap do
  desc "Generate a Google sitemap from the models"
  task(:generate => :environment) do
    # Generate sitemaps for each of the models listed in the array
    sources = %w( Airport Navaid Fix AnotherModel AnotherModel AndAnotherModel EtCetera )
    sitemap = GoogleSitemapGenerator.new('http://yourdomain.com', sources)
    sitemap.generate
  end
end

public/sitemaps

Assuming this directory doesn’t exist already, create it.

Also, depending on what stack you’re using to deploy your Rails app, you may also need to tell your server to skip proxying HTTP requests to this directory. For example, I’m proxying requests to Mongrel via Apache. So, in the Apache virtual host conf file for my app, I had to add a ProxyPass directive so Apache would serve the sitemap files instead of Mongrel.

# Right after the ProxyPass directives for images, stylesheets, and javascripts
ProxyPass /sitemaps !

Don’t forget to restart Apache after you save the new conf file!

Add a Cron Job

Lastly, you need to add a cron job to call the rake task so we can generate new sitemap files from time to time. The frequency is up to you and the requirements of your app.

Unfortunately, I’m not up to date on raw Cron commands. I use a GUI provided by my web host. But here’s the command I’m using on Solaris to call the rake task. You’ll have to edit this to suit the specifics of your application and server environment.

cd /var/www/apps/myapp/current &amp&amp /opt/local/bin/rake RAILS_ENV=production google_sitemap:generate

Don’t forget to tell Rake to use the production environment. Another potential gotcha: you usually have to give cron the full path to rake. You can find out where it is on your server by logging in as the user you plan to use for the cron job (usually root) and doing “which rake”. If that doesn’t bring it up it means rake isn’t in your PATH. That’s okay. You’ll just have to do a little more digging to find out where rake is installed on your system.

If I’ve left out anything let me know. By the way, this would make a great plugin or gem, if only I knew how to make them.

Twitter Updates for 2008-06-07

June 7, 2008
  • And now there’s no wind whatsoever. Great model rocket flying weather. Time to break out the Estes kit. #
  • Anyone have a brightkite invite they can send me? #
  • Thanks, everyone, for the invites. I got one. #

Twitter Updates for 2008-06-04

June 4, 2008
  • @jrec a nx, r klmo bjc bdu kden klmo, 1:37, pic 1:37, “this is only a test” #
  • Discovered a bug in my code right before bed. Like preparing to leave on a road trip and discovering your car doesn’t have seat belts. #
  • Thankfully there was a quick solution. I imagine seat belts are a little harder to come by at the last minute. #
  • After the Tech Meetup I got a few more ideas for a small web app that I dreamt up last year. A lot less complicated than Jetrecord. #

Twitter Updates for 2008-06-03

June 3, 2008
  • Doctor’s office. Pain on the right side of my throat. #

Twitter Updates for 2008-05-25

May 25, 2008
  • Did the Boulder Creek Festival today. When I moved here (the first time) 11 years ago the festival was the first thing I did. #

Twitter Updates for 2008-05-24

May 24, 2008
  • The word is it’s wise if it makes you wonder, The word is it’s wise if it feeds your hunger, The word is it’s wise if it’s like no other. #

Rails 2.1 config.gem and Twitter4R

May 21, 2008

Using the new config.gem method in Rails 2.1? Using the Twitter4R gem to interact with Twitter in your Rails application? Make sure you add the :lib attribute, otherwise ruby-gems won’t know which gem you’re talking about.

In your environment.rb file:

config.gem 'twitter4r', :lib => 'twitter'

Twitter Updates for 2008-05-18

May 18, 2008
  • Barking prairie dogs, blackbirds, crickets, a slight breeze, sunset, 1554, lawn chair, backyard. #

links for 2008-05-17

May 17, 2008

Twitter Updates for 2008-05-15

May 15, 2008
  • Working for myself, going outside feels like being let out into the yard like a dog. Go out, do your business, and get back inside. #

links for 2008-05-15

Twitter Updates for 2008-05-14

May 14, 2008
  • Waiting for the arms of Morpheus. #
  • It was a good day to start testing but I’m sure tomorrow will be even beta. Goodnight moon. #

links for 2008-05-13

May 13, 2008

Twitter Updates for 2008-05-12

May 12, 2008
  • Another day, another root canal. I’ll be glad when this is over. #

Twitter Updates for 2008-05-11

May 11, 2008
  • A B-24 Liberator just flew over the house. #