Aug. 11th, 2004

issaferret: (comfy)
It seems to me that there should be a way to take advantage of the current Object-Oriented programming ideology to make a more efficient, or perhaps simply more usable, parser and lexical analyzer kit.
Ranting about lack of correctly designed kits... oh wait, there's one )

Ah. Wait! Someone did, it looks like - just found a program called JavaCC which incorporates a lexer and a parser together.

Compilers are one of my favorite topics, largely because unlike several of the topics, they lurk _just_ out of my reach instead of miles over my head. The math involved is relatively sensible, unlike, say, 3D textures, which requires Digital Signal Processing knowledge, which requires much calculus. I hate calculus.

Compilation's basically a two-part problem. Lexical analysis and parsing. In lexing, you take a stream of characters and differentiate them into 'tokens'. In English, the tokens are words and punctuation. So output from the lexer for English (if one existed, which it doesn't) would be something like:

PRONOUN("I") VERB("am") INDEFINITE_ARTICLE("a") NOUN("fish") COMMA(",") VERB("feed") PRONOUN("me") PERIOD(".")


Once a string has been lexed, it's fed into a parser, which makes _sense_ of it. Parsers are made up of a set of rules called a grammar. Just like English, a grammar delineates what the legal ways to put together tokens are.

So, something like this would be the beginnings of an English grammar - I'll be nice and put it into english phrases rather than the standard way grammars are specified (It's called Backus-Naur Form or BNF, and it's actually pretty readable... for computer languages. AFAIK, you can't actually get English into BNF, quite. Too fucked up a grammar structure)


  • A SENTENCE is made up of one or more PHRASES, terminated by a PERIOD.
  • A PHRASE is made up of a SUBJECT, an OBJECT, and a VERB, or a SUBJECT and a VERB, or whatever, put in the right order (you have to be explicit in a real grammar, but wow, is that hard for english. There're a lot of people trying to solve the 'natural language problem' for computers, so they could understand English. It's... not yet there.)
  • a SUBJECT is a PRONOUN, or an INDEFINITE ARTICLE followed by a NOUN



And so on.

Once the Parser has gathered certain collections of stuff, say, a PHRASE in the above example, it takes some kind of action. In computer compiler situations, this action is often translating the phrase into machine code. I'd guess that if a similar process were going on in your head, the result would be some kind of comprehension - "Ah, he's a fish. Wait, what?". Eventually, the whole string has been translated into whatever end product you wanted, and you're done parsing, save perhaps for some optimization. "Ah, he _thinks_ he's a fish, and wants fish food. I'll go into the other room and call the nice young men in their clean white coats. This guy's fish slipped off the hook."

um. Like you cared. Most of my quick toss-off projects quickly find themselves in need of customization, and a config file of some sort usually ends up being the answer. Which means that I run into parsing problems relatively often for someone who's not a compiler programmer.

*shrug*

December 2016

S M T W T F S
    123
4 5678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 21st, 2025 05:46 pm
Powered by Dreamwidth Studios