Parsing Pipeline
The parsing pipeline is designed for reliability and repeatability. Every import is tracked as a session and can be audited or retried.
Pipeline stages
-
Upload
- File stored in
uploads/ - SHA-256 hash computed for deduplication
- File stored in
-
Parser selection
ParserFactorychooses a bank-specific parser when available- Fallback parsers handle CSV/XLSX and generic AI/OCR paths
-
Import session
- An
import_sessionrecord is created - Metadata stores parser, source, and status
- An
-
Transaction extraction
- Parsed rows are normalized into
transactionentities - Source mapping is stored for traceability
- Parsed rows are normalized into
-
AI categorization
- Optional AI pipeline (Gemini/OpenRouter)
- Confidence thresholds and retry logic guard quality
-
Deduplication
- Hash checks plus heuristics on date/amount/text
- Conflicts flagged for manual review
Error handling
- Parsing errors are captured on the import session
- Failed stages can be retried without reuploading
- Logs include structured context for diagnostics
Related code
backend/src/modules/parsingbackend/src/modules/importbackend/src/modules/classification
Next: Importing Statements