9.2 KiB
Rust Components - AI Assistant Instructions
This file provides guidance for SQLFluff's Rust components.
Overview
The sqlfluffrs/ directory contains an experimental Rust implementation of performance-critical SQLFluff components. This is an ongoing effort to accelerate lexing and parsing operations while maintaining compatibility with the Python implementation.
Project Status
Current state: Experimental and under development
Goals:
- Accelerate lexing performance (tokenization)
- Speed up parsing for large SQL files
- Maintain API compatibility with Python components
- Provide optional Rust-based acceleration for production users
Not a replacement: The Rust components are designed to work alongside Python, not replace the entire codebase.
Structure
sqlfluffrs/
├── Cargo.toml # Rust package manifest
├── pyproject.toml # Python packaging for Rust extension
├── LICENSE.md # License
├── README.md # Rust component README
├── py.typed # Type stub marker
├── sqlfluffrs.pyi # Python type stubs for Rust extension
└── src/ # Rust source code
├── lib.rs # Library root
├── python.rs # Python bindings (PyO3)
├── lexer.rs # Lexer implementation
├── marker.rs # Position markers
├── matcher.rs # Pattern matching
├── regex.rs # Regex utilities
├── slice.rs # String slicing
├── config/ # Configuration handling
├── dialect/ # Dialect definitions
├── templater/ # Template handling
└── token/ # Token types
Rust Development Setup
Requirements
- Rust: Install via rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh - Cargo: Comes with Rust installation
- Python development headers: Required for PyO3 bindings
Building Rust Components
# Navigate to Rust directory
cd sqlfluffrs
# Build Rust library
cargo build
# Build release (optimized)
cargo build --release
# Run Rust tests
cargo test
# Run with output
cargo test -- --nocapture
# Check code without building
cargo check
# Format code
cargo fmt
# Lint code
cargo clippy
Python Integration
The Rust components are exposed to Python via PyO3:
# Build and install Python extension
cd sqlfluffrs
pip install -e .
# Or from repository root
pip install -e ./sqlfluffrs/
Rust Coding Standards
Style
- Follow Rust conventions: Use
rustfmtfor formatting - Naming:
snake_casefor functions, variables, modulesPascalCasefor types, structs, enums, traitsSCREAMING_SNAKE_CASEfor constants
- Idiomatic Rust: Prefer iterators, pattern matching, and ownership patterns
Error Handling
Prefer Result and ? operator:
fn parse_token(input: &str) -> Result<Token, ParseError> {
let trimmed = input.trim();
if trimmed.is_empty() {
return Err(ParseError::EmptyInput);
}
Ok(Token::new(trimmed))
}
fn process() -> Result<(), ParseError> {
let token = parse_token(" SELECT ")?; // Use ? operator
// ... use token
Ok(())
}
Avoid unwrap() and expect() in production code:
// ❌ Bad: Can panic
let value = some_option.unwrap();
// ✅ Good: Handle None case
let value = match some_option {
Some(v) => v,
None => return Err(Error::MissingValue),
};
// ✅ Also good: Use ? with Option
let value = some_option.ok_or(Error::MissingValue)?;
Exception: unwrap() and expect() are acceptable in tests.
Testing
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_token_parsing() {
let result = parse_token("SELECT");
assert!(result.is_ok());
assert_eq!(result.unwrap().value, "SELECT");
}
#[test]
fn test_empty_input_fails() {
let result = parse_token("");
assert!(result.is_err());
}
}
Run tests:
cargo test
cargo test --lib # Library tests only
cargo test --release # Optimized build
Python-Rust Interface (PyO3)
Exposing Rust to Python
Basic example in src/python.rs:
use pyo3::prelude::*;
#[pyfunction]
fn tokenize(sql: &str) -> PyResult<Vec<String>> {
let tokens = internal_tokenize(sql)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyValueError, _>(e.to_string()))?;
Ok(tokens)
}
#[pymodule]
fn sqlfluffrs(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(tokenize, m)?)?;
Ok(())
}
Python usage:
import sqlfluffrs
tokens = sqlfluffrs.tokenize("SELECT * FROM users")
print(tokens) # ['SELECT', '*', 'FROM', 'users']
Type Stubs
Provide Python type hints in sqlfluffrs.pyi:
from typing import List
def tokenize(sql: str) -> List[str]: ...
Architecture
Lexer
The Rust lexer (src/lexer.rs) tokenizes SQL strings:
pub struct Lexer {
config: LexerConfig,
}
impl Lexer {
pub fn new(config: LexerConfig) -> Self {
Lexer { config }
}
pub fn lex(&self, sql: &str) -> Result<Vec<Token>, LexError> {
// Tokenization logic
}
}
Matcher
Pattern matching for grammar rules (src/matcher.rs):
pub trait Matcher {
fn matches(&self, tokens: &[Token]) -> bool;
}
pub struct SequenceMatcher {
matchers: Vec<Box<dyn Matcher>>,
}
Dialect Support
Rust dialects mirror Python dialects (src/dialect/):
pub struct Dialect {
name: String,
reserved_keywords: HashSet<String>,
unreserved_keywords: HashSet<String>,
}
Performance Considerations
Benchmarking
Use Criterion for benchmarks:
// benches/lexer_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn lexer_benchmark(c: &mut Criterion) {
c.bench_function("lex_simple_select", |b| {
b.iter(|| {
let sql = black_box("SELECT * FROM users WHERE id = 1");
lex(sql)
});
});
}
criterion_group!(benches, lexer_benchmark);
criterion_main!(benches);
Run benchmarks:
cargo bench
Optimization
- Use
cargo build --releasefor production builds - Profile with
cargo flamegraphorperf - Prefer zero-copy operations where possible
- Use
&stroverStringwhen ownership not needed
Development Workflow
Making Changes
- Edit Rust code in
src/ - Run tests:
cargo test - Format code:
cargo fmt - Lint:
cargo clippy - Build Python extension:
pip install -e . - Test Python integration:
import sqlfluffrs # Test Rust functions from Python
Syncing with Python
After changing Rust lexer/parser:
-
Regenerate dialect bindings:
# From repository root source .venv/bin/activate python utils/rustify.py build -
Test against Python test suite:
tox -e py312
Common Tasks
Adding New Lexer Pattern
- Edit
src/lexer.rs - Add pattern matching logic
- Write tests
- Run
cargo test - Update Python bindings if needed
Updating Dialect
- Edit
src/dialect/<dialect>.rs - Update keyword lists or grammar
- Sync with Python via
utils/rustify.py build - Test with
cargo test
Exposing New Function to Python
- Add function in appropriate Rust module
- Add Python binding in
src/python.rs:#[pyfunction] fn my_new_function(input: &str) -> PyResult<String> { // Implementation } - Register in module:
#[pymodule] fn sqlfluffrs(_py: Python, m: &PyModule) -> PyResult<()> { m.add_function(wrap_pyfunction!(my_new_function, m)?)?; Ok(()) } - Add type stub to
sqlfluffrs.pyi:def my_new_function(input: str) -> str: ... - Rebuild and test
Testing
Rust Unit Tests
# All tests
cargo test
# Specific test
cargo test test_lexer_keywords
# Show output
cargo test -- --nocapture
# With release optimizations
cargo test --release
Integration with Python Tests
Rust components are tested via Python test suite:
# Ensure Rust extension is built
cd sqlfluffrs && pip install -e . && cd ..
# Run Python tests
tox -e py312
Resources
- Rust Book: https://doc.rust-lang.org/book/
- PyO3 Guide: https://pyo3.rs/
- Cargo Book: https://doc.rust-lang.org/cargo/
- Rust by Example: https://doc.rust-lang.org/rust-by-example/
Current Limitations
- Experimental and incomplete
- Not all Python features implemented
- Performance gains vary by use case
- May have compatibility issues with some dialects
Contributing to Rust Components
Rust contributions are welcome but should:
- Maintain API compatibility with Python
- Include tests
- Follow Rust conventions
- Update Python type stubs
- Sync with Python implementation via
rustify.py
See also:
- Root
AGENTS.mdfor general project overview src/sqlfluff/AGENTS.mdfor Python coding standardssqlfluffrs/README.mdfor Rust-specific README