Expand description
Two-stage lexer for LOGOS natural language input.
The lexer transforms natural language text into a token stream suitable for parsing. It operates in two stages:
§Stage 1: Line Lexer
The LineLexer handles structural concerns:
- Indentation: Tracks indent levels, emits
Indent/Dedenttokens - Block boundaries: Identifies significant whitespace
- Content extraction: Passes line content to Stage 2
§Stage 2: Word Lexer
The Lexer performs word-level tokenization:
- Vocabulary lookup: Identifies words via the lexicon database
- Morphological analysis: Handles inflection (verb tenses, plurals)
- Ambiguity resolution: Uses priority rules for ambiguous words
§Ambiguity Rules
When a word matches multiple lexicon entries, priority determines the token:
- Quantifiers over nouns (“some” → Quantifier, not Noun)
- Determiners over adjectives (“the” → Determiner, not Adjective)
- Verbs over nouns for -ing/-ed forms (“running” → Verb)
§Example
Input: "Every cat sleeps."
Output: [Quantifier("every"), Noun("cat"), Verb("sleeps"), Period]Structs§
- Lexer
- Line
Lexer - Stage 1 Lexer: Handles only lines, indentation, and structural tokens.
Treats all other text as opaque
Contentfor the Stage 2 WordLexer.