Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > mozilla bookmarks

Reply
Thread Tools

mozilla bookmarks

 
 
Dick Davies
Guest
Posts: n/a
 
      09-22-2004
long shot but what the hell - don't suppose any of you good
good people are sitting on a parser for Mozilla/Firefox bookmarks.html
files, by any chance?


nah, didn't think so



Ah well, never mind. I found that squirting it into REXML:ocument.new()
by way of 'tidy -asxml' at least stops the constructor choking to death, I'll
have to take it from there.....

--
Forms follow function, and often obliterate it.
Rasputin :: Jack of All Trades - Master of Nuns


 
Reply With Quote
 
 
 
 
Jamis Buck
Guest
Posts: n/a
 
      09-22-2004
--------------070106000601070906010503
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Dick Davies wrote:
> long shot but what the hell - don't suppose any of you good
> good people are sitting on a parser for Mozilla/Firefox bookmarks.html
> files, by any chance?


Funny you should ask. I've had this for awhile, and I can't even
remember why I wrote it. It's pretty hacked together, and it's not a
true "parser" (I just search for certain patterns in the bookmark file)
and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks
file, but it should be fairly straightforward to modify for your own
purposes.

Hope this is at least close to what you are looking for...

- Jamis

--
Jamis Buck
http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.jamisbuck.org/jamis

"I use octal until I get to 8, and then I switch to decimal."

--------------070106000601070906010503
Content-Type: text/plain;
name="bookmarks.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="bookmarks.rb"

#!/usr/bin/ruby

class Item
attr_accessor :last_modified
attr_accessor :id
attr_accessor :title
attr_accessor :remarks

def to_html_attr_list
s = ""
s << " LAST_MODIFIED=\"#{@last_modified}\"" if @last_modified
s << " ID=\"#{@id}\"" if @id
return s
end
end

class Folder < Item
attr_reader :items

def initialize
@items = Array.new
end

def dump( level = 0 )
puts "#{' ' * level * 2}#{title}" if @title
@items.each do |i|
i.dump( level+1 )
end
end

def sort!
@items.sort! do |a,b|
if a.type == b.type
a.title.downcase <=> b.title.downcase if a.type == b.type
elsif a.is_a? Folder
-1
elsif b.is_a? Folder
1
else
raise "wrong type in folder"
end
end

@items.each { |i| i.sort! if i.is_a? Folder }
end

def to_html( level, file )
indent = " " * level * 4
file.puts indent + "<DT><H3#{to_html_attr_list}>#{@title}</H3>" if @title
file.puts indent + "<DD>#{@remarks}" if @remarks
file.puts indent + "<HR>" if !@title # hack for top-level folder
file.puts indent + "<DL><p>"

@items.each do |i|
i.to_html( level+1, file )
end

file.puts indent + "</DL><p>"
end
end

class Bookmark < Item
attr_accessor :last_visit
attr_accessor :icon
attr_accessor :last_charset
attr_accessor :href

def dump( level )
print " " * level * 2
print "'" + @title + "' => "
puts @href
end

def to_html_attr_list
s = super
s << " LAST_VISIT=\"#{@last_visit}\"" if @last_visit
s << " ICON=\"#{@icon}\"" if @icon
s << " LAST_CHARSET=\"#{@last_charset}\"" if @last_charset
s << " HREF=\"#{@href}\"" if @href
return s
end

def to_html( level, file )
indent = " " * level * 4
file.puts indent + "<DT><A#{to_html_attr_list}>#{@title}</A>"
file.puts indent + "<DD>#{@remarks}" if @remarks
end
end

class BookmarkManager
def initialize
@top_folder = Folder.new
end

def build_attribute_hash( str )
list = str.scan( /[_A-Z]+="[^"]*"/ )
hash = Hash.new
list.each do |item|
item =~ /([_A-Z]+)="(.*)"/
hash[ $1 ] = $2
end
hash
end

def append( bookmarks_file )
folder_stack = [ @top_folder ]

File.open( bookmarks_file, "r" ) do |file|
# skip to the start of the bookmark data
while ( line = file.gets.strip ) != "<DL><p>"; end

last_item = nil
while folder_stack.length > 0
line = file.gets.strip

case line
when /<HR>/ then
# separator...
last_item = nil

when /<DT><H3 (.*)>(.*)<\/H3>/
last_item = folder = Folder.new
attr_list = $1
folder.title = $2
attrs = build_attribute_hash( attr_list )
folder.last_modified = attrs[ "LAST_MODIFIED" ]
folder.id = attrs[ "ID" ]
folder_stack.last.items.push folder
folder_stack.push folder

when /<DT><A (.*)>(.*)<\/A>/
last_item = bookmark = Bookmark.new
attr_list = $1
bookmark.title = $2
attrs = build_attribute_hash( attr_list )
bookmark.last_modified = attrs[ "LAST_MODIFIED" ]
bookmark.id = attrs[ "ID" ]
bookmark.last_visit = attrs[ "LAST_VISIT" ]
bookmark.icon = attrs[ "ICON" ]
bookmark.last_charset = attrs[ "LAST_CHARSET" ]
bookmark.href = attrs[ "HREF" ]
folder_stack.last.items.push bookmark

when /<\/DL><p>/
folder_stack.pop
last_item = nil

when /<DD>(.*)/
last_item.remarks = $1

when /<DL><p>/
# start of a list
end
end
end

@top_folder.sort!
end

def dump
puts "Bookmarks:"
@top_folder.dump
end

def to_html( file )
file.puts "<!DOCTYPE NETSCAPE-Bookmark-file-1>"
file.puts "<!-- This is an automatically generated file."
file.puts " It will be read and overwritten."
file.puts " DO NOT EDIT! -->"
file.puts "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=UTF-8\">"
file.puts "<TITLE>Bookmarks</TITLE>"
file.puts "<H1>Bookmarks</H1>"
file.puts

@top_folder.to_html( 0, file )
end
end


mgr = BookmarkManager.new
mgr.append "/home/jgb3/.phoenix/default/d2isamzz.slt/bookmarks.html"
mgr.to_html( $stdout )

--------------070106000601070906010503--


 
Reply With Quote
 
 
 
 
Ben Giddings
Guest
Posts: n/a
 
      09-22-2004
Dick Davies wrote:
> long shot but what the hell - don't suppose any of you good
> good people are sitting on a parser for Mozilla/Firefox bookmarks.html
> files, by any chance?


I have successfully used my htmltokenizer module to parse them. It
depends what you're looking for though. It's not specific to mozilla
bookmarks, but I have (for example) written a little script to compare
bookmark files and find links that only exist in one of them.

Ben


 
Reply With Quote
 
James Britt
Guest
Posts: n/a
 
      09-22-2004
Dick Davies wrote:

> long shot but what the hell - don't suppose any of you good
> good people are sitting on a parser for Mozilla/Firefox bookmarks.html
> files, by any chance?
>
>
> nah, didn't think so
>
>
>
> Ah well, never mind. I found that squirting it into REXML:ocument.new()
> by way of 'tidy -asxml' at least stops the constructor choking to death, I'll
> have to take it from there.....


Do you use the Ruby/Tidy wrapper?

http://www.rubyxml.com/index.rb/Appl...r_XML_Tidy.txt


James






 
Reply With Quote
 
Dick Davies
Guest
Posts: n/a
 
      09-23-2004
* James Britt <(E-Mail Removed)> [0921 17:21]:
> Dick Davies wrote:
>
> >long shot but what the hell - don't suppose any of you good
> >good people are sitting on a parser for Mozilla/Firefox bookmarks.html
> >files, by any chance?
> >
> >
> >nah, didn't think so
> >
> >
> >
> >Ah well, never mind. I found that squirting it into REXML:ocument.new()
> >by way of 'tidy -asxml' at least stops the constructor choking to death,
> >I'll have to take it from there.....

>
> Do you use the Ruby/Tidy wrapper?
>
> http://www.rubyxml.com/index.rb/Appl...r_XML_Tidy.txt


I will eventually, I think - though to be honest even on my monster bookmark
file the tidy warning/error output is longer than the generated XML

--
Census Taker to Housewife: Did you ever have the measles, and, if so,
how many?
Rasputin :: Jack of All Trades - Master of Nuns


 
Reply With Quote
 
Dick Davies
Guest
Posts: n/a
 
      09-23-2004
* Jamis Buck <(E-Mail Removed)> [0926 16:26]:
> Dick Davies wrote:
> >long shot but what the hell - don't suppose any of you good
> >good people are sitting on a parser for Mozilla/Firefox bookmarks.html
> >files, by any chance?

>
> Funny you should ask. I've had this for awhile, and I can't even
> remember why I wrote it. It's pretty hacked together, and it's not a
> true "parser" (I just search for certain patterns in the bookmark file)
> and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks
> file, but it should be fairly straightforward to modify for your own
> purposes.
>
> Hope this is at least close to what you are looking for...


Thanks a lot, it was handy to get a feel for it - I gave up on a parser too (I'd prefer not to require extra libs), and did a
cutdown homegrown version in the end (I only need url, folder info and description myself) :

-----------------------------------------------------------------
rasputin@lb:lib$ cat mozbooks.rb
#!/usr/bin/env ruby

# quick and dirty bookmarks.html parser - thanks to Jamis Buck for the 'folder state machine' idea

class MozBooks

# pull urls, descriptions and folder heirarchy info from mozilla/firefox bookmarks.html
def self.parse(bm)
folders = []
bm.each_line{ |l|
folders.pop if l =~ /<\/dl><p>/i # we just left a folder
folders << $1 if l =~ /\s*<dt><h3[^>]+>(.*)<\/h3>/i # we just entered a folder
puts "url = #{$1}, desc = #{$2}, folder = #{folders.join('/')}" if l =~ /a href="([^"]*)"[^>]+>([^<]+)</i
}
end
end

mb = MozBooks.parse($stdin)
-----------------------------------------------------------------

and that seems to work (enough info for my purposes anyway, I can feed this lot into del.icio.us).... thanks!

rasputin@lb:booty$ cat ~/bookmarks.html | ruby lib/mozbooks.rb |grep -i ruby|head
url = http://raa.ruby-lang.org/, desc = RAA - Ruby Application Archive, folder = toolbar/search
url = http://www.rubygarden.org/ruby?UsingRubyFastCGI, desc = Ruby: UsingRubyFastCGI, folder = toolbar/proj/FastCGI
url = http://dev.faeriemud.org/changes-1.8.0.html, desc = New Features in Ruby 1.8.0, folder = toolbar/ruby/1.8
url = http://www.rubygarden.org/ruby?RIOnePointEight, desc = Ruby: RIOnePointEight, folder = toolbar/ruby/1.8
url = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, desc = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, folder = toolbar/ruby/1.8
url = http://www.rubyist.net/~matz/slides/.../mgp00003.html, desc = MagicPoint presentation foils, folder = toolbar/ruby/1.8
url = http://whytheluckystiff.net/articles...rubyOneEightOh, desc = whyTHEluckySTIFF ;,. What's Shiny and New in Ruby 1.8.0? .,;, folder = toolbar/ruby/1.8
url = http://images-jp.amazon.com/images/P...9.LZZZZZZZ.jpg, desc = 4894714531.09.LZZZZZZZ.jpg (JPEG Image, 375x475 pixels), folder = toolbar/ruby/community
url = http://www2a.biglobe.ne.jp/~seki/ruby/, desc = I like Ruby., folder = toolbar/ruby/community
url = http://www.excite.co.jp/world/url/bo...co=excitejapan, desc = Matz' Blog, folder = toolbar/ruby/community





--
It's always darkest just before it gets pitch black.
Rasputin :: Jack of All Trades - Master of Nuns


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
mozilla bookmarks occasionally lost Mike Henley Firefox 3 07-27-2004 10:06 PM
FireFox to Mozilla - How to transfer bookmarks Jay E. Firefox 5 04-15-2004 05:24 AM
Mozilla ate my bookmarks raindog Firefox 1 02-20-2004 01:19 AM
Manage bookmarks question for mozilla 1.4 Indigo Moon Man Firefox 0 12-23-2003 12:24 AM
mozilla won't save bookmarks Adam Bailey Firefox 3 10-06-2003 02:27 AM



Advertisments