Tag: Portfolio

Google Sitemaps with Ruby on Rails, Capistrano, and Cron

This is a slight modification of code originally written by Alastair Brunton. I recently implemented this for Jetrecord and since Alastair was so generous, I decided to share the love as well. I have changed Alastair’s code to generate a sitemap index file plus sitemap files for each model, all of them gzipped to save on bandwidth.

I have also added Capistrano code to copy sitemap files from the previous release to the current release so we don’t lose our sitemap files when we deploy a new release.

Remember, Google sitemaps are for publicly available URLs. They’re for pages that you want Google to find and index. If you don’t want Google to find your CIA Operatives records, don’t tell Google about it!

Let’s just go straight to the code. I am going from the top down in my application’s root directory.

app/models/your_model.rb

You must add this code to each model that you want to generate a sitemap for. Here is an example for Airports on Jetrecord.

# put this inside app/models/airport.rb
def self.get_paths
  path_ar = []
  self.find(:all).each do |model|
    path_ar << {:url => "/airports/#{model.to_param}", :last_mod => model.updated_at.strftime('%Y-%m-%d')}
  end
  path_ar
end

config/sitemap/sitemap_tasks.rb

This is for Capistrano. You probably don’t have a config/sitemap directory. I created one and put my Capistrano sitemap task in it. This tells Capistrano, “After deploying my new release, copy the sitemap files from the previous release and store them in the same location in the current release.”

Capistrano::Configuration.instance(:must_exist).load do
  namespace :sitemap do
 
    desc "Copy the sitemap files after deploy"
    task :copy_sitemap, :roles => :app do
      puts "copying Rails sitemap files"
      sudo "cp #{previous_release}/public/sitemaps/* #{current_release}/public/sitemaps/"
    end
 
    after :deploy, 'sitemap:copy_sitemap'
  end
end

config/deploy.rb

This file usually contains your typical Capistrano recipes. All you have to do is require the sitemap_tasks file we created above.

# At the top of the file, after any other required files
require 'config/sitemap/sitemap_tasks'

lib/google_sitemap.rb

This is the meat of the whole thing. Kudos to Alastair for setting this up. The reason I modified it into using a sitemap index with sitemaps for each model is because Google allows a total of 50,000 links per sitemap. I have 48,000 navigation fixes, 20,000 airports, and 3,000 navaids in Jetrecord. By necessity I have to split my sitemap into many sitemaps.

I’m also gzipping the sitemap files because Google can read them and it saves bandwidth. Oh, and the URL to ping Google has changed, as has the XML namespace for their sitemap tags.

require 'net/http'
require 'uri'
 
# A class specific to the application which generates a google sitemap from the contents of the database.
# Author: Alastair Brunton
# Modified: Harry Love 2008-06-09
class GoogleSitemapGenerator
 
  def initialize(base_url, sources)
    @base_url = base_url
    @sources = sources
  end
 
  # 1. Iterate through each model's #get_paths method
  # 2. Create sitemap file for each model
  # 3. Create sitemap index file
  # 4. Ping Google
  def generate
    path_ar = []
    sitemaps = []
    @sources.each do |source|
      # initialize the class and call the get_paths method on it.
      path_ar = eval("#{source}.get_paths")
      xml = generate_sitemap(path_ar)
      save_file(source, xml)
    end
    index = generate_sitemap_index(@sources)
    save_file('index', index)
    update_google
  end
 
  # Create a sitemap document for a model
  def generate_sitemap(path_ar)
    xml_str = ""
    xml = Builder::XmlMarkup.new(:target => xml_str)
    xml.instruct!
    xml.urlset(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') {
      path_ar.each do |path|
        xml.url {
      	  xml.loc(@base_url + path[:url])
      	  xml.lastmod(path[:last_mod])
      	  xml.changefreq('weekly')
        }
      end
    }
    xml_str
  end
 
  # Create a sitemap index document
  def generate_sitemap_index(sitemaps)
    xml_str = ""
    xml = Builder::XmlMarkup.new(:target => xml_str)
    xml.instruct!
    xml.sitemapindex(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') {
      sitemaps.each do |site|
        xml.sitemap {
      	  xml.loc(@base_url + "/sitemaps/sitemap_#{site}.xml.gz")
      	  xml.lastmod(Time.now.strftime('%Y-%m-%d'))
   	}
      end
    }
    xml_str
  end
 
  # Save the xml file (gzipped) to disk
  def save_file(source, xml)
    File.open(RAILS_ROOT + "/public/sitemaps/sitemap_#{source}.xml.gz", 'w+') do |f|
      gz = Zlib::GzipWriter.new(f)
      gz.write xml
      gz.close
    end
  end
 
  # Notify Google of the new sitemap index file
  def update_google
    sitemap_uri = @base_url + '/sitemaps/sitemap_index.xml.gz'
    escaped_sitemap_uri = URI.escape(sitemap_uri)
    Net::HTTP.get('www.google.com', '/webmasters/tools/ping?sitemap=' + escaped_sitemap_uri)
  end
end

lib/tasks/sitemap.rake

This is the rake task that we’ll call periodically from Cron to generate new sitemap files.

require 'google_sitemap'
namespace :google_sitemap do
  desc "Generate a Google sitemap from the models"
  task(:generate => :environment) do
    # Generate sitemaps for each of the models listed in the array
    sources = %w( Airport Navaid Fix AnotherModel AnotherModel AndAnotherModel EtCetera )
    sitemap = GoogleSitemapGenerator.new('http://yourdomain.com', sources)
    sitemap.generate
  end
end

public/sitemaps

Assuming this directory doesn’t exist already, create it.

Also, depending on what stack you’re using to deploy your Rails app, you may also need to tell your server to skip proxying HTTP requests to this directory. For example, I’m proxying requests to Mongrel via Apache. So, in the Apache virtual host conf file for my app, I had to add a ProxyPass directive so Apache would serve the sitemap files instead of Mongrel.

# Right after the ProxyPass directives for images, stylesheets, and javascripts
ProxyPass /sitemaps !

Don’t forget to restart Apache after you save the new conf file!

Add a Cron Job

Lastly, you need to add a cron job to call the rake task so we can generate new sitemap files from time to time. The frequency is up to you and the requirements of your app.

Unfortunately, I’m not up to date on raw Cron commands. I use a GUI provided by my web host. But here’s the command I’m using on Solaris to call the rake task. You’ll have to edit this to suit the specifics of your application and server environment.

cd /var/www/apps/myapp/current &amp&amp /opt/local/bin/rake RAILS_ENV=production google_sitemap:generate

Don’t forget to tell Rake to use the production environment. Another potential gotcha: you usually have to give cron the full path to rake. You can find out where it is on your server by logging in as the user you plan to use for the cron job (usually root) and doing “which rake”. If that doesn’t bring it up it means rake isn’t in your PATH. That’s okay. You’ll just have to do a little more digging to find out where rake is installed on your system.

If I’ve left out anything let me know. By the way, this would make a great plugin or gem, if only I knew how to make them.

Dynamic Breadcrumbs with JavaScript, Revision 4

Updates from Revision 3:

  • Document title text can also be replaced by text in the replaceTheseCharacters array
  • Check for the presence of the tag to attach to before attempting to attach to it (lots of ‘t’s in that one)
  • Added a little bit more documentation to the top

Continue reading …

Activity Report Form (ARF)

ARF activity page

Client

University of Washington

Details

The Activity Report Form (aka The ARF) is an internal activity tracking application that I created for the librarians of the Health Sciences Library. The librarians need to see statistics, reports, and trends on the types of activities they engage in and the groups they interact with. The previous solution used a web based form from a third party provider that we customized. After capturing the data I was required to import the data into Excel each month and fiddle with the input and output to make it look the right way. Ugh!

Continue reading …

Loveoirs

Loveoirs article page

Client

Me

Details

Loveoirs is our family blog, a place to keep the Love memoirs. Love-oirs. Loveoirs. See? I redesign the site fairly often. At least once a year, I think. (I use “fairly often” liberally, especially considering this site has been redesigned 5 times in the last year.) It’s a chance for me to practice graphic design in a personal way, a way that reflects our family.

Continue reading …

MyHealth Toolkit

MyHealth home page

Client

University of Washington

Details

The MyHealth Toolkit on HealthLinks was created to address the needs of a growing number of users coming to HealthLinks looking for personal health information, a topic that HealthLinks wasn’t originally designed to cover. The requirements called for a simple, clean web site with a new style that would be easy to use on a variety of screens, including a touch-screen kiosk in the lobby of the Health Sciences Library. Like HealthLinks, the MyHealth Toolkit is a mini-portal to external content, so the point is to get users there as quickly as possible without distractions.

Continue reading …

Lai Real Estate

Lai Real Estate home page

Client

Sam Lai, Lai Real Estate, Ltd.

Details

Sam is a friend of mine who needed a web site for his real estate appraisal business in Seattle. We met several times to discuss style and requirements of the site. In addition to listing his services and rates, Sam wanted an online appraisal request form that would allow customers to submit requests directly from his site.

Continue reading …

Calvary Chapel, Seattle

Calvary Chapel Ministries page

Client

Calvary Chapel, Seattle, WA

Details

Calvary Chapel is a Christian church in Seattle, Washington. The old web site was built with static HTML, font tags, and JavaScript rollovers on the navigation images. It desperately needed a design face lift. As a member of the church, I originally volunteered my services to the staff to clean up invalid HTML and help with broken links and content updates. Shortly after work began I sat down with the senior pastor and proposed creating a small team to investigate redesigning the site from the ground up.

Continue reading …

New Books List on HealthLinks

New Books page

Client

University of Washington

Details

The New Books List on HealthLinks was my first experience using Perl to solve a real problem. In 2005 the Health Sciences Library wanted to have a simple way of listing the newest health sciences books from the web catalog on the HealthLinks web site. They also wanted to use the list of titles generated from Innovative’s Millennium product rather than have to type in the titles by hand.

Continue reading …

Finding Drug Information Tutorial

Finding Drug Information template

Client

University of Washington

Details

Every so often the librarians at the HSL need to create a longer tutorial that goes beyond the typical How-To page. These tutorials usually end up replacing in-person class visits that have become too cumbersome and for which having permanent content available any time would be beneficial for the students.

Continue reading …

How-To Pages on HealthLinks

How-To template example

Client

University of Washington

Details

HealthLinks is the portal to the Health Sciences Libraries at UW. (For more information, see my write-up of HealthLinks.) Before the redesign of HealthLinks, the How-To pages didn’t have a consistent look and feel. My boss asked me to design a new template that could accommodate help/how-to pages of various topics and lengths. The template also had to look different from the usual pages on HealthLinks while still showing that it was part of the larger site.

Continue reading …

HealthLinks

HealthLinks home page

Client

University of Washington

Details

HealthLinks is the portal to the Health Sciences Libraries at UW. In 2002 it was decided that HealthLinks needed an update to its look and feel and a reorganization of its content to better serve our users. We also wanted to move from static HTML to a database-driven site with reusable content.

Continue reading …