Pretty dodgy, and as before, the ESTC doesn't do you any favors with the markup, but here's what I got.#!/usr/bin/env rubyrequire 'open-uri'require 'nokogiri'url = 'http://estc.bl.uk/F/1K33ABJEHVSDU7X7HFCTYFGQB3S7UMUTKG5U84RR5SY7GJYM89-10241?func=scan&scan_start=000209485&scan_code=INT&find_scan_code=INT&scan_op=PREV'titles = open('titles.txt','a')while url doopen(url) do |blob|page = Nokogiri::HTML(blob)next_page = page.css("img[alt='Next Page']").firstif next_pageurl = next_page.parent['href']elseurl = falseendpage.css("td.td1>a").each do |link|titles.write "#{link.text}\n"endendendtitles.close
Tuesday, December 20, 2016
Just a bunch of ESTC Library Names
Following Meaghan Brown on trying to match STC and ESTC library names, I threw together a quickie ruby script that parses all the library names from the ESTC library name browse list, then follows the "Next Page" links while they are present and grabs the next set.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment