Hello Racc
30 May 2007
When I said HelloCup I was looking at a yacc based parser in a language that didn't require me to handle my dirty pointers. Another alternative to play with is Ruby which now has a yaccish parser built in to the standard library - inevitably called racc.
Racc has an interesting interplay between ruby and grammar syntax. You define the grammar with a racc file which will generate a parser class.
Again I'll do my simple hello world case. The input text is
item camera item laser
I'll populate item objects inside a catalog, using the following model classes.
class Item attr_reader :name def initialize name @name = name end end class Catalog extend Forwardable def initialize @items = [] end def_delegators :@items, :size, :<<, :[] end
Forwardable
is a handy library that allows me to
delegate methods to an instance variable. In this case I delegate a
bunch of methods to the @items
list.
I test what I read with this.
class Tester < Test::Unit::TestCase def testReadTwo parser = ItemParser.new parser.parse "item camera\nitem laser\n" assert_equal 2, parser.result.size assert_equal 'camera', parser.result[0].name assert_equal 'laser', parser.result[1].name end def testReadBad parser = ItemParser.new parser.parse "xitem camera" fail rescue #expected end end
To build the file and run the tests I use a simple rake file.
# rakefile... task :default => :test file 'item.tab.rb' => 'item.y.rb' do sh 'racc item.y.rb' end task :test => 'item.tab.rb' do require 'rake/runtest' Rake.run_tests 'test.rb' end
The racc
command needs to be installed on your
system. I did it the easy way on Ubuntu with
apt-get
. It takes the input file and creates one named
inputFileName.tab.rb
.
The parser grammar class is a special format, but one that's pretty familiar to yaccish people. For this simple example it looks like this:
#file item.y.rb... class ItemParser token 'item' WORD rule catalog: item | item catalog; item: 'item' WORD {@result << Item.new(val[1])}; end
The tokens clause declares the token's we get from the lexer. I
use the string 'item'
and WORD
as a
symbol. The rule clause starts the production rules which are in the
usual BNF form for yacc. As you might expect I can write actions
inside curlies. To refer to the elements of the rule I use the
val
array, so val[1]
is the equivalent to
$2
in yacc (ruby uses 0 based array indexes, but I've
forgiven it). Should I wish to return a value from the rule
(equivalent to yacc's $$
) I assign
it to the variable result
.
The most complicated part of using racc is to sort out the lexer.
Racc expects to call a method that yields tokens, where each token is a
two-element array with the first element being the type of token
(matching the token declaration) and the second element the value
(what shows up in val
- usually the text). You mark the
end of the token stream with [false, false]
. The sample
code with racc uses regular expression matching on a string. A better
choice for most cases is to use StringScanner
, which is
in the standard ruby library.
I can use this scanner to convert a string into an array of tokens.
#file item.y.rb.... ---- inner def make_tokens str require 'strscan' result = [] scanner = StringScanner.new str until scanner.empty? case when scanner.scan(/\s+/) #ignore whitespace when match = scanner.scan(/item/) result << ['item', nil] when match = scanner.scan(/\w+/) result << [:WORD, match] else raise "can't recognize <#{scanner.peek(5)}>" end end result << [false, false] return result end
To integrate the scanner into the parser, racc allows you to
place code into the generated parser class. You do this by adding code
to the grammar file. The declaration ---- inner
marks the
code to go inside the generated class (you can also put code at the
head and foot of the generated file). I'm calling a parse
method in my test, so I need to implement that.
#file item.y.rb.... ---- inner attr_accessor :result def parse(str) @result = Catalog.new @tokens = make_tokens str do_parse end
The do_parse
method initiates the generated
parser. This will call next_token
to get at the next
token, so we need to implement that method and include it in the
inner section.
#file item.y.rb.... ---- inner def next_token @tokens.shift end
This is enough to make racc work with the file. However as I play with it I find the scanner more messy than I would like. I really just want it to tell the lexer what patterns to match and what to return with them. Something like this.
#file item.y.rb.... ---- inner def make_lexer aString result = Lexer.new result.ignore /\s+/ result.keyword 'item' result.token /\w+/, :WORD result.start aString return result end
To make this work I write my own lexer wrapper over the base functionality provided by StringScanner. Here's the code to set up the lexer and and handle the above configuration.
class Lexer... require 'strscan' def initialize @rules = [] end def ignore pattern @rules << [pattern, :SKIP] end def token pattern, token @rules << [pattern, token] end def keyword aString @rules << [Regexp.new(aString), aString] end def start aString @base = StringScanner.new aString end
To perform the scan I need to use StringScanner to compare the rules against the input stream.
class Lexer... def next_token return [false, false] if @base.empty? t = get_token return (:SKIP == t[0]) ? next_token : t end def get_token @rules.each do |key, value| m = @base.scan(key) return [value, m] if m end raise "unexpected characters <#{@base.peek(5)}>" end
I can then alter the code in the parser to call this lexer instead.
#file item.y.rb.... ---- inner def parse(arg) @result = Catalog.new @lexer = make_lexer arg do_parse end def next_token @lexer.next_token end
As well as giving me a better way to define the rules, this also allows the grammar to control the lexer because it's only grabbing one token at a time - this would give me a mechanism to implement lexical states later on.
On the whole racc is pretty easy to set up and use - providing you know yacc. The documentation is on the minimal side of sketchy. There's a simple manual on the website and some sample code. There's also a very helpful presentation on racc. I also got a few tips from our Mingle team who've used it for a nifty customization language inside Mingle.