Add instructions for agents (#7261)

This commit is contained in:
Peter Budai
2025-11-12 00:09:05 +01:00
committed by GitHub
parent f6742cbee5
commit 6325aa7f3a
7 changed files with 2692 additions and 0 deletions

234
AGENTS.md Normal file
View File

@@ -0,0 +1,234 @@
# SQLFluff AI Coding Assistant Instructions
## Project Overview
SQLFluff is a dialect-flexible SQL linter and auto-fixer supporting 25+ SQL dialects including T-SQL, PostgreSQL, BigQuery, MySQL, Snowflake, and more. The project is written primarily in **Python** with an experimental **Rust** component for performance optimization.
**Core Architecture:**
- **Parser-first design**: SQL is lexed → parsed into segment trees → linted by rules → optionally auto-fixed
- **Dialect inheritance**: Each dialect extends ANSI base grammar using `.replace()` to override specific segments
- **Segment-based AST**: Everything is a `BaseSegment` subclass forming a recursive tree structure
- **Rule crawlers**: Rules traverse segment trees to find violations and generate `LintFix` objects
## Repository Structure
```
/
├── src/sqlfluff/ # Main Python package (see src/sqlfluff/AGENTS.md)
│ ├── dialects/ # SQL dialect definitions (see src/sqlfluff/dialects/AGENTS.md)
│ ├── rules/ # Linting rules by category
│ ├── core/ # Parser, lexer, config infrastructure
│ ├── cli/ # Command-line interface
│ └── api/ # Public Python API
├── sqlfluffrs/ # Experimental Rust components (see sqlfluffrs/AGENTS.md)
├── test/ # Test suite (see test/AGENTS.md)
│ ├── fixtures/ # Test data (SQL files, YAML expected outputs)
│ ├── dialects/ # Dialect parsing tests
│ └── rules/ # Rule testing infrastructure
├── docs/ # Sphinx documentation (see docs/AGENTS.md)
├── plugins/ # Pluggable extensions (dbt templater, examples)
├── utils/ # Build and development utilities
└── examples/ # API usage examples
```
## Universal Conventions
### Language Support
- **Python**: 3.9 minimum, 3.12 recommended for development, 3.13 supported
- **Rust**: Experimental, used for performance-critical lexing/parsing
### Code Quality Standards
- **Formatting**: Black (Python), rustfmt (Rust)
- **Linting**: Ruff + Flake8 (Python), clippy (Rust)
- **Type Checking**: Mypy strict mode (Python)
- **Pre-commit hooks**: Run before all commits via `.venv/bin/pre-commit run --all-files`
### Testing Philosophy
- All tests must pass before merging
- Test coverage should reach 100%
- Use YAML fixtures for dialect/rule tests
- Mirror source structure in test directories
### Commit Messages
- Keep messages clear and descriptive
- Reference issue numbers when applicable
- Use conventional commit style when appropriate
## Core Commands
### Environment Setup
```bash
# Create development environment (first time)
tox -e py312 --devenv .venv
source .venv/bin/activate
# Always activate before working in a new terminal
source .venv/bin/activate
```
### Testing
```bash
# Run full test suite
tox
# Run specific Python version tests
tox -e py312
# Run with coverage
tox -e cov-init,py312,cov-report,linting,mypy
# Quick dialect test (after adding SQL fixtures)
python test/generate_parse_fixture_yml.py -d tsql
```
### Quality Checks
```bash
# Run all pre-commit checks (format, lint, type check)
.venv/bin/pre-commit run --all-files
# Individual checks
black src/ test/ # Format
ruff check src/ test/ # Lint
mypy src/sqlfluff/ # Type check
```
### Building
```bash
# Install package in editable mode (done automatically by tox --devenv)
pip install -e .
# Install plugins
pip install -e plugins/sqlfluff-templater-dbt/
```
## Architecture Principles
### Layer Separation
The codebase enforces strict architectural boundaries (via `importlinter` in `pyproject.toml`):
- `core` layer cannot import `api`, `cli`, `dialects`, `rules`, or `utils`
- `api` layer cannot import `cli`
- Dependencies flow: `linter``rules``parser``errors`/`types``helpers`
### Immutability
- Segments are immutable - never modify directly
- Use `.copy()` or `LintFix` mechanisms for changes
- Parser creates fresh tree structures
### Lazy Loading
- Dialects loaded via `dialect_selector()` or `load_raw_dialect()`
- Never import dialect modules directly
- Supports dynamic dialect discovery
## Development Workflows
### Adding Dialect Features
1. Create `.sql` test files in `test/fixtures/dialects/<dialect>/`
2. Run `python test/generate_parse_fixture_yml.py -d <dialect>` to generate expected `.yml` outputs
3. Implement grammar in `src/sqlfluff/dialects/dialect_<name>.py`
4. Use `dialect.replace()` to override inherited ANSI segments
5. Verify: `tox -e generate-fixture-yml -- -d <dialect>`
See `src/sqlfluff/dialects/AGENTS.md` for detailed dialect development guide.
### Adding Linting Rules
1. Create rule class in appropriate category under `src/sqlfluff/rules/`
2. Define metadata: `code`, `name`, `description`, `groups`
3. Implement `_eval(context: RuleContext) -> Optional[LintResult]`
4. Add YAML test cases to `test/fixtures/rules/std_rule_cases/<category>.yml`
5. Run: `tox -e py312 -- test/rules/yaml_test_cases_test.py -k <rule_code>`
### Fixing Parser Issues
1. Identify failing SQL in `test/fixtures/dialects/<dialect>/*.sql`
2. Run fixture generator to see current parse tree
3. Modify grammar segments in dialect file
4. Regenerate fixtures to verify
5. Check that changes don't break other dialects: `tox -e generate-fixture-yml`
### Documentation Updates
1. Edit source files in `docs/source/`
2. Build locally: `cd docs && make html`
3. View: `open docs/build/html/index.html`
4. Verify links and formatting
See `docs/AGENTS.md` for documentation-specific guidelines.
## Component-Specific Instructions
For detailed instructions on specific components, refer to:
- **Python source code**: `src/sqlfluff/AGENTS.md`
- **Dialect development**: `src/sqlfluff/dialects/AGENTS.md`
- **Rust components**: `sqlfluffrs/AGENTS.md`
- **Testing**: `test/AGENTS.md`
- **Documentation**: `docs/AGENTS.md`
## Common Pitfalls
### Parser Development
- ❌ Don't modify segment instances directly (immutable)
- ✅ Use `.copy()` or `LintFix` for modifications
- ❌ Don't import dialect modules directly
- ✅ Use `dialect_selector()` for lazy loading
- ❌ Don't use class references in grammar definitions
- ✅ Use `Ref("SegmentName")` string references
### Testing
- ❌ Don't put dialect-specific tests in ANSI fixtures
- ✅ Place tests in the most specific applicable dialect
- ❌ Don't forget to regenerate YAML fixtures after grammar changes
- ✅ Always run `generate_parse_fixture_yml.py` after parser edits
- ❌ Don't create monolithic test files
- ✅ Organize by segment type (e.g., `create_table.sql`, `select_statement.sql`)
### Code Quality
- ❌ Don't skip type hints
- ✅ All public functions need type annotations
- ❌ Don't bypass pre-commit hooks
- ✅ Run `.venv/bin/pre-commit run --all-files` before committing
- ❌ Don't violate import layer boundaries
- ✅ Check `pyproject.toml` importlinter contracts
## Configuration
SQLFluff uses `.sqlfluff` files (INI format) for configuration:
- Placed in project root or any parent directory
- Key sections: `[sqlfluff]`, `[sqlfluff:rules]`, `[sqlfluff:rules:<rule_code>]`
- Programmatic: `FluffConfig.from_root(overrides={...})`
## Plugin System
- Plugins live in `plugins/` directory
- Installed via `pip install -e plugins/<plugin-name>/`
- Entry points defined in plugin's `pyproject.toml`
- Examples: `sqlfluff-templater-dbt`, `sqlfluff-plugin-example`
## Quick Reference
### Most Common Tasks
```bash
# Add new SQL test case for a dialect
echo "SELECT TOP 10 * FROM users;" > test/fixtures/dialects/tsql/top_clause.sql
python test/generate_parse_fixture_yml.py -d tsql
# Test a specific rule
tox -e py312 -- test/rules/yaml_test_cases_test.py -k AL01
# Check code quality before commit
.venv/bin/pre-commit run --all-files
# Run tests for just the parser module
tox -e py312 -- test/core/parser/
# Check dialect parsing without writing fixtures
sqlfluff parse test.sql --dialect tsql
```
### Performance Tips
- Use `-k` flag in pytest to filter tests during development
- Run `generate-fixture-yml` with `-d <dialect>` to test one dialect
- Use `tox -e py312` instead of full `tox` during iteration
- Activate venv to run `pytest` directly (faster than tox for single runs)
---
**Remember**: The goal is to maintain SQLFluff as a high-quality, reliable SQL linting tool. Take time to understand the architecture, write comprehensive tests, and follow the established patterns. When in doubt, look at existing similar implementations in the codebase.

532
docs/AGENTS.md Normal file
View File

@@ -0,0 +1,532 @@
# Documentation - AI Assistant Instructions
This file provides guidelines for building and maintaining SQLFluff documentation.
## Documentation System
SQLFluff uses **Sphinx** for documentation generation with:
- **Source**: `docs/source/` (reStructuredText files)
- **Build output**: `docs/build/` (HTML, generated)
- **Live docs**: https://docs.sqlfluff.com
- **Auto-generated content**: API docs, rule reference, dialect lists
## Documentation Structure
```
docs/
├── source/ # Documentation source files
│ ├── conf.py # Sphinx configuration
│ ├── index.rst # Homepage
│ ├── gettingstarted.rst # Getting started guide
│ ├── why_sqlfluff.rst # Project overview
│ ├── inthewild.rst # Real-world usage
│ ├── jointhecommunity.rst # Community info
│ ├── configuration/ # Configuration docs
│ │ ├── index.rst
│ │ └── setting_configuration.rst
│ ├── guides/ # Developer guides
│ │ ├── index.rst
│ │ ├── first_contribution.rst
│ │ └── dialect_development.rst
│ ├── reference/ # API and rule reference
│ │ ├── index.rst
│ │ ├── rules.rst
│ │ └── api.rst
│ ├── production/ # Production deployment
│ ├── _static/ # Static assets (CSS, images)
│ ├── _ext/ # Sphinx extensions
│ └── _partials/ # Reusable doc fragments
├── build/ # Generated HTML (gitignored)
├── Makefile # Build commands (Unix)
├── make.bat # Build commands (Windows)
├── requirements.txt # Doc build dependencies
└── README.md # Documentation README
../generate-auto-docs.py # Script to generate auto-docs
```
## Building Documentation Locally
### Setup
```bash
# Activate virtual environment
source .venv/bin/activate
# Install documentation dependencies
pip install -r docs/requirements.txt
```
### Building HTML Docs
```bash
# Navigate to docs directory
cd docs
# Build HTML (Unix/Linux/Mac)
make html
# Build HTML (Windows)
make.bat html
# View built documentation
open build/html/index.html # macOS
xdg-open build/html/index.html # Linux
# Or manually open docs/build/html/index.html in browser
```
### Clean Build
```bash
cd docs
# Clean previous build
make clean
# Build fresh
make html
```
### Live Reload During Development
For rapid iteration, use `sphinx-autobuild`:
```bash
# Install sphinx-autobuild
pip install sphinx-autobuild
# Run live-reload server
cd docs
sphinx-autobuild source build/html
# Open browser to http://127.0.0.1:8000
# Docs rebuild automatically on file changes
```
## Documentation Format
### reStructuredText (RST)
SQLFluff docs use RST format (`.rst` files).
**Basic syntax:**
```rst
Page Title
==========
Section Heading
---------------
Subsection
~~~~~~~~~~
**Bold text**
*Italic text*
``inline code``
`Link text <https://example.com>`_
- Bullet list item
- Another item
1. Numbered list
2. Second item
.. code-block:: sql
SELECT * FROM users
WHERE active = 1;
.. code-block:: python
from sqlfluff.core import Linter
linter = Linter(dialect="tsql")
.. note::
This is a note box.
.. warning::
This is a warning box.
```
### Cross-References
```rst
Link to another doc:
:doc:`gettingstarted`
Link to section:
:ref:`configuration-label`
Link to Python class:
:class:`sqlfluff.core.Linter`
Link to function:
:func:`sqlfluff.lint`
```
## Documentation Types
### User-Facing Documentation
**Getting Started** (`gettingstarted.rst`):
- Installation instructions
- Quick start examples
- Basic usage patterns
**Configuration** (`configuration/`):
- Configuration file format
- Available settings
- Dialect-specific config
**Rules Reference** (`reference/rules.rst`):
- Auto-generated from rule metadata
- Rule descriptions, examples, configuration options
- **Updated automatically** via `generate-auto-docs.py`
### Developer Documentation
**Guides** (`guides/`):
- First contribution walkthrough
- Dialect development guide
- Rule development guide
- Architecture overview
**API Reference** (`reference/api.rst`):
- Auto-generated from docstrings
- Python API documentation
- Class and function references
### Production Documentation
**Production** (`production/`):
- CI/CD integration
- Performance tuning
- Deployment best practices
## Auto-Generated Documentation
### Generating Auto-Docs
Some documentation is generated from source code:
```bash
# Generate auto-documentation (rules, dialects, etc.)
python docs/generate-auto-docs.py
# Build docs after generation
cd docs
make html
```
**What gets auto-generated:**
- Rule reference (from rule metadata)
- Dialect list (from available dialects)
- API documentation (from docstrings)
### Rule Documentation
Rules are documented via their metadata:
```python
class Rule_AL01(BaseRule):
"""Implicit aliasing of table not allowed.
**Anti-pattern**
Using implicit alias for tables:
.. code-block:: sql
SELECT * FROM users u
**Best practice**
Use explicit AS keyword:
.. code-block:: sql
SELECT * FROM users AS u
"""
groups = ("all", "aliasing")
# ... rest of rule
```
Docstring is extracted and added to rule reference docs.
## Documentation Style Guide
### Writing Style
- **Clear and concise**: Use simple language
- **Active voice**: "Run the command" not "The command should be run"
- **Present tense**: "SQLFluff parses SQL" not "SQLFluff will parse SQL"
- **Examples**: Include code examples for every feature
- **User perspective**: Write from user's point of view
### Code Examples
**Always include:**
- Context (what the example demonstrates)
- Complete, runnable code
- Expected output when relevant
**SQL examples:**
```rst
.. code-block:: sql
-- Anti-pattern: implicit alias
SELECT * FROM users u;
.. code-block:: sql
-- Best practice: explicit alias
SELECT * FROM users AS u;
```
**Python examples:**
```rst
.. code-block:: python
from sqlfluff.core import Linter
linter = Linter(dialect="tsql")
result = linter.lint_string("SELECT * FROM users")
print(result.violations)
```
**Shell examples:**
```rst
.. code-block:: bash
# Lint a SQL file
sqlfluff lint query.sql
# Fix issues automatically
sqlfluff fix query.sql
```
### Sections and Headers
Use consistent header hierarchy:
```rst
Page Title (Top Level)
======================
Major Section
-------------
Subsection
~~~~~~~~~~
Sub-subsection
^^^^^^^^^^^^^^
```
### Links and References
**External links:**
```rst
See the `official documentation <https://docs.sqlfluff.com>`_ for details.
```
**Internal cross-references:**
```rst
For configuration options, see :doc:`configuration/index`.
As described in :ref:`dialect-development`, each dialect...
```
**Define reference labels:**
```rst
.. _dialect-development:
Dialect Development
-------------------
This section covers dialect development...
```
## Checking Documentation Quality
### Sphinx Warnings
Sphinx warns about issues during build:
```bash
cd docs
make html
# Look for warnings like:
# WARNING: document isn't included in any toctree
# WARNING: undefined label: some-label
# ERROR: Unknown directive type "cod-block" (typo!)
```
Fix all warnings before committing documentation changes.
### Link Checking
```bash
cd docs
# Check for broken links
make linkcheck
# Review output for HTTP errors, redirects, broken anchors
```
### Spell Checking
SQLFluff uses `codespell` for spell checking:
```bash
# Run from repository root
codespell docs/source/
# Or via pre-commit
.venv/bin/pre-commit run codespell --all-files
```
## Documentation Workflow
### Adding New Documentation
1. **Create or edit `.rst` file** in `docs/source/`
2. **Add to table of contents** (toctree) in parent `index.rst`:
```rst
.. toctree::
:maxdepth: 2
existing_page
new_page
```
3. **Build and review:**
```bash
cd docs
make clean html
open build/html/index.html
```
4. **Check for warnings** during build
5. **Run link checker:**
```bash
make linkcheck
```
6. **Commit both source and auto-generated files** if applicable
### Updating Existing Documentation
1. **Edit `.rst` file**
2. **Rebuild docs:**
```bash
cd docs
make html
```
3. **Review changes** in browser
4. **Check for new warnings**
5. **Commit changes**
### Adding Code Examples
1. **Create example in `examples/`** directory (optional):
```python
# examples/08_new_feature.py
from sqlfluff.core import Linter
linter = Linter(dialect="tsql")
result = linter.lint_string("SELECT * FROM users")
print(result.violations)
```
2. **Reference in documentation:**
```rst
.. literalinclude:: ../../examples/08_new_feature.py
:language: python
:linenos:
```
3. **Or embed directly:**
```rst
.. code-block:: python
from sqlfluff.core import Linter
linter = Linter(dialect="tsql")
```
## Common Documentation Tasks
### Document New Rule
1. **Add docstring to rule class** with anti-pattern and best practice
2. **Regenerate docs:**
```bash
python docs/generate-auto-docs.py
```
3. **Build and verify:**
```bash
cd docs && make html
```
### Document New Dialect
1. **Add dialect overview** to `reference/dialects.rst` or create new file
2. **Include supported features** and known limitations
3. **Provide examples** of dialect-specific syntax
4. **Update auto-generated dialect list:**
```bash
python docs/generate-auto-docs.py
```
### Add Tutorial/Guide
1. **Create new `.rst` file** in `docs/source/guides/`
2. **Add to toctree** in `docs/source/guides/index.rst`
3. **Include step-by-step instructions** with examples
4. **Build and test** all commands/code in tutorial
## Sphinx Configuration
Configuration in `docs/source/conf.py`:
**Key settings:**
- `project`: "SQLFluff"
- `extensions`: Sphinx extensions used
- `html_theme`: Documentation theme
- `html_static_path`: Static assets directory
**Custom extensions** in `docs/source/_ext/`:
- Custom directives or roles
- Auto-documentation generators
## Testing Documentation Build in CI
Documentation builds are tested in CI/CD:
- Ensures no Sphinx warnings or errors
- Validates all links
- Checks for spelling errors
**Local pre-check before committing:**
```bash
# Build docs
cd docs && make clean html
# Check links
make linkcheck
# Spell check
cd .. && codespell docs/source/
# Review any warnings/errors
```
---
**See also:**
- Root `AGENTS.md` for general project overview
- `CONTRIBUTING.md` for contribution guidelines
- [Sphinx documentation](https://www.sphinx-doc.org/) for RST syntax reference

439
sqlfluffrs/AGENTS.md Normal file
View File

@@ -0,0 +1,439 @@
# Rust Components - AI Assistant Instructions
This file provides guidance for SQLFluff's Rust components.
## Overview
The `sqlfluffrs/` directory contains an **experimental Rust implementation** of performance-critical SQLFluff components. This is an ongoing effort to accelerate lexing and parsing operations while maintaining compatibility with the Python implementation.
## Project Status
**Current state**: Experimental and under development
**Goals:**
- Accelerate lexing performance (tokenization)
- Speed up parsing for large SQL files
- Maintain API compatibility with Python components
- Provide optional Rust-based acceleration for production users
**Not a replacement**: The Rust components are designed to work alongside Python, not replace the entire codebase.
## Structure
```
sqlfluffrs/
├── Cargo.toml # Rust package manifest
├── pyproject.toml # Python packaging for Rust extension
├── LICENSE.md # License
├── README.md # Rust component README
├── py.typed # Type stub marker
├── sqlfluffrs.pyi # Python type stubs for Rust extension
└── src/ # Rust source code
├── lib.rs # Library root
├── python.rs # Python bindings (PyO3)
├── lexer.rs # Lexer implementation
├── marker.rs # Position markers
├── matcher.rs # Pattern matching
├── regex.rs # Regex utilities
├── slice.rs # String slicing
├── config/ # Configuration handling
├── dialect/ # Dialect definitions
├── templater/ # Template handling
└── token/ # Token types
```
## Rust Development Setup
### Requirements
- **Rust**: Install via [rustup](https://rustup.rs/)
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
- **Cargo**: Comes with Rust installation
- **Python development headers**: Required for PyO3 bindings
### Building Rust Components
```bash
# Navigate to Rust directory
cd sqlfluffrs
# Build Rust library
cargo build
# Build release (optimized)
cargo build --release
# Run Rust tests
cargo test
# Run with output
cargo test -- --nocapture
# Check code without building
cargo check
# Format code
cargo fmt
# Lint code
cargo clippy
```
### Python Integration
The Rust components are exposed to Python via **PyO3**:
```bash
# Build and install Python extension
cd sqlfluffrs
pip install -e .
# Or from repository root
pip install -e ./sqlfluffrs/
```
## Rust Coding Standards
### Style
- **Follow Rust conventions**: Use `rustfmt` for formatting
- **Naming**:
- `snake_case` for functions, variables, modules
- `PascalCase` for types, structs, enums, traits
- `SCREAMING_SNAKE_CASE` for constants
- **Idiomatic Rust**: Prefer iterators, pattern matching, and ownership patterns
### Error Handling
**Prefer `Result` and `?` operator:**
```rust
fn parse_token(input: &str) -> Result<Token, ParseError> {
let trimmed = input.trim();
if trimmed.is_empty() {
return Err(ParseError::EmptyInput);
}
Ok(Token::new(trimmed))
}
fn process() -> Result<(), ParseError> {
let token = parse_token(" SELECT ")?; // Use ? operator
// ... use token
Ok(())
}
```
**Avoid `unwrap()` and `expect()` in production code:**
```rust
// ❌ Bad: Can panic
let value = some_option.unwrap();
// ✅ Good: Handle None case
let value = match some_option {
Some(v) => v,
None => return Err(Error::MissingValue),
};
// ✅ Also good: Use ? with Option
let value = some_option.ok_or(Error::MissingValue)?;
```
**Exception**: `unwrap()` and `expect()` are acceptable in tests.
### Testing
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_token_parsing() {
let result = parse_token("SELECT");
assert!(result.is_ok());
assert_eq!(result.unwrap().value, "SELECT");
}
#[test]
fn test_empty_input_fails() {
let result = parse_token("");
assert!(result.is_err());
}
}
```
Run tests:
```bash
cargo test
cargo test --lib # Library tests only
cargo test --release # Optimized build
```
## Python-Rust Interface (PyO3)
### Exposing Rust to Python
**Basic example** in `src/python.rs`:
```rust
use pyo3::prelude::*;
#[pyfunction]
fn tokenize(sql: &str) -> PyResult<Vec<String>> {
let tokens = internal_tokenize(sql)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyValueError, _>(e.to_string()))?;
Ok(tokens)
}
#[pymodule]
fn sqlfluffrs(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(tokenize, m)?)?;
Ok(())
}
```
**Python usage:**
```python
import sqlfluffrs
tokens = sqlfluffrs.tokenize("SELECT * FROM users")
print(tokens) # ['SELECT', '*', 'FROM', 'users']
```
### Type Stubs
Provide Python type hints in `sqlfluffrs.pyi`:
```python
from typing import List
def tokenize(sql: str) -> List[str]: ...
```
## Architecture
### Lexer
The Rust lexer (`src/lexer.rs`) tokenizes SQL strings:
```rust
pub struct Lexer {
config: LexerConfig,
}
impl Lexer {
pub fn new(config: LexerConfig) -> Self {
Lexer { config }
}
pub fn lex(&self, sql: &str) -> Result<Vec<Token>, LexError> {
// Tokenization logic
}
}
```
### Matcher
Pattern matching for grammar rules (`src/matcher.rs`):
```rust
pub trait Matcher {
fn matches(&self, tokens: &[Token]) -> bool;
}
pub struct SequenceMatcher {
matchers: Vec<Box<dyn Matcher>>,
}
```
### Dialect Support
Rust dialects mirror Python dialects (`src/dialect/`):
```rust
pub struct Dialect {
name: String,
reserved_keywords: HashSet<String>,
unreserved_keywords: HashSet<String>,
}
```
## Performance Considerations
### Benchmarking
Use Criterion for benchmarks:
```rust
// benches/lexer_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn lexer_benchmark(c: &mut Criterion) {
c.bench_function("lex_simple_select", |b| {
b.iter(|| {
let sql = black_box("SELECT * FROM users WHERE id = 1");
lex(sql)
});
});
}
criterion_group!(benches, lexer_benchmark);
criterion_main!(benches);
```
Run benchmarks:
```bash
cargo bench
```
### Optimization
- Use `cargo build --release` for production builds
- Profile with `cargo flamegraph` or `perf`
- Prefer zero-copy operations where possible
- Use `&str` over `String` when ownership not needed
## Development Workflow
### Making Changes
1. **Edit Rust code** in `src/`
2. **Run tests:**
```bash
cargo test
```
3. **Format code:**
```bash
cargo fmt
```
4. **Lint:**
```bash
cargo clippy
```
5. **Build Python extension:**
```bash
pip install -e .
```
6. **Test Python integration:**
```python
import sqlfluffrs
# Test Rust functions from Python
```
### Syncing with Python
After changing Rust lexer/parser:
1. **Regenerate dialect bindings:**
```bash
# From repository root
source .venv/bin/activate
python utils/rustify.py build
```
2. **Test against Python test suite:**
```bash
tox -e py312
```
## Common Tasks
### Adding New Lexer Pattern
1. Edit `src/lexer.rs`
2. Add pattern matching logic
3. Write tests
4. Run `cargo test`
5. Update Python bindings if needed
### Updating Dialect
1. Edit `src/dialect/<dialect>.rs`
2. Update keyword lists or grammar
3. Sync with Python via `utils/rustify.py build`
4. Test with `cargo test`
### Exposing New Function to Python
1. Add function in appropriate Rust module
2. Add Python binding in `src/python.rs`:
```rust
#[pyfunction]
fn my_new_function(input: &str) -> PyResult<String> {
// Implementation
}
```
3. Register in module:
```rust
#[pymodule]
fn sqlfluffrs(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(my_new_function, m)?)?;
Ok(())
}
```
4. Add type stub to `sqlfluffrs.pyi`:
```python
def my_new_function(input: str) -> str: ...
```
5. Rebuild and test
## Testing
### Rust Unit Tests
```bash
# All tests
cargo test
# Specific test
cargo test test_lexer_keywords
# Show output
cargo test -- --nocapture
# With release optimizations
cargo test --release
```
### Integration with Python Tests
Rust components are tested via Python test suite:
```bash
# Ensure Rust extension is built
cd sqlfluffrs && pip install -e . && cd ..
# Run Python tests
tox -e py312
```
## Resources
- **Rust Book**: https://doc.rust-lang.org/book/
- **PyO3 Guide**: https://pyo3.rs/
- **Cargo Book**: https://doc.rust-lang.org/cargo/
- **Rust by Example**: https://doc.rust-lang.org/rust-by-example/
## Current Limitations
- Experimental and incomplete
- Not all Python features implemented
- Performance gains vary by use case
- May have compatibility issues with some dialects
## Contributing to Rust Components
Rust contributions are welcome but should:
- Maintain API compatibility with Python
- Include tests
- Follow Rust conventions
- Update Python type stubs
- Sync with Python implementation via `rustify.py`
---
**See also:**
- Root `AGENTS.md` for general project overview
- `src/sqlfluff/AGENTS.md` for Python coding standards
- `sqlfluffrs/README.md` for Rust-specific README

413
src/sqlfluff/AGENTS.md Normal file
View File

@@ -0,0 +1,413 @@
# Python Source Code - AI Assistant Instructions
This file provides Python-specific development guidelines for SQLFluff's main source code.
## Python Standards
### Version Support
- **Minimum**: Python 3.9
- **Recommended for development**: Python 3.12
- **Maximum tested**: Python 3.13
### Code Style & Formatting
#### Black (Auto-formatter)
- Default settings (line length: 88 characters)
- Run: `black src/ test/`
- Automatically enforced via pre-commit hooks
#### Ruff (Linter)
- Fast Python linter with isort and pydocstyle integration
- Run: `ruff check src/ test/`
- Auto-fix: `ruff check --fix src/ test/`
- Checks import order, docstring style, common code smells
#### Flake8 (Additional Linting)
- Used with flake8-black plugin
- Configured in `pyproject.toml`
### Type Annotations
**Required for all public functions and methods:**
```python
from typing import Optional, Union, List, Dict, cast, TYPE_CHECKING
def parse_sql(sql: str, dialect: str = "ansi") -> Optional[BaseSegment]:
"""Parse SQL string into segment tree.
Args:
sql: SQL string to parse.
dialect: SQL dialect name.
Returns:
Root segment or None if parsing fails.
"""
pass
```
**Key Mypy settings** (strict mode enabled):
- `warn_unused_configs = true`
- `strict_equality = true`
- `no_implicit_reexport = true`
**Avoiding circular imports:**
```python
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from sqlfluff.core.parser import BaseSegment
```
### Documentation Standards
**Google-style docstrings required:**
```python
def complex_function(param1: str, param2: int, flag: bool = False) -> Dict[str, Any]:
"""Short one-line description.
Longer description explaining the purpose, behavior, and usage.
Can span multiple lines when needed.
Args:
param1: Description of first parameter.
param2: Description of second parameter.
flag: Optional flag for special behavior. Defaults to False.
Returns:
Dictionary containing results with keys 'status', 'data', etc.
Raises:
ValueError: If param1 is empty.
SQLParseError: If parsing fails.
"""
pass
```
**Exceptions to docstring requirements:**
- Magic methods (e.g., `__init__`, `__str__`) - D105, D107 ignored
- Private methods may have simplified docstrings
- Test functions use descriptive names instead
### Import Organization
**Enforced order** (via Ruff isort):
```python
# 1. Standard library imports
import os
import sys
from typing import Optional
# 2. Third-party imports
import click
import yaml
# 3. First-party imports (sqlfluff packages)
from sqlfluff.core.parser import BaseSegment
from sqlfluff.core.rules import BaseRule
```
**Import linter contracts** (in `pyproject.toml`):
- `core` cannot import from `api`, `cli`, `dialects`, `rules`, `utils`
- `api` cannot import from `cli`
- Use specific imports: `from module import SpecificClass` (not `import *`)
## Architecture & Design Patterns
### Segment System
**All AST nodes inherit from `BaseSegment`:**
```python
from sqlfluff.core.parser import BaseSegment
from sqlfluff.core.parser.grammar import Sequence, Ref, OneOf
class SelectStatementSegment(BaseSegment):
"""A SELECT statement."""
type = "select_statement"
match_grammar = Sequence(
"SELECT",
Ref("SelectClauseSegment"),
Ref("FromClauseSegment", optional=True),
Ref("WhereClauseSegment", optional=True),
)
```
**Key principles:**
- Segments are **immutable** - never modify in place
- Use `.copy()` to create modified versions
- `match_grammar` defines parsing rules recursively
- Use `Ref("SegmentName")` not direct class references
### Rule System
**Rules inherit from `BaseRule`:**
```python
from sqlfluff.core.rules import BaseRule, LintResult, LintFix, RuleContext
from sqlfluff.core.rules.crawlers import SegmentSeekerCrawler
class Rule_AL01(BaseRule):
"""Implicit aliasing of table not allowed."""
groups = ("all", "aliasing")
crawl_behaviour = SegmentSeekerCrawler({"table_reference"})
def _eval(self, context: RuleContext) -> Optional[LintResult]:
"""Evaluate rule against segment.
Args:
context: Rule context with segment and dialect info.
Returns:
LintResult if violation found, None otherwise.
"""
if context.segment.has_implicit_alias:
return LintResult(
anchor=context.segment,
fixes=[LintFix.replace(context.segment, [new_segments])],
)
return None
```
**Rule metadata:**
- `code`: Unique identifier (e.g., "AL01", "LT02")
- `name`: Human-readable name
- `description`: What the rule checks
- `groups`: Categories like "all", "core", "aliasing"
- `crawl_behaviour`: Which segment types to examine
### Dialect System
**Dialects use inheritance and replacement:**
```python
from sqlfluff.core.dialects import load_raw_dialect
from sqlfluff.core.parser.grammar import Sequence, Ref
# Load parent dialect
ansi_dialect = load_raw_dialect("ansi")
tsql_dialect = ansi_dialect.copy_as("tsql")
# Override specific segments
tsql_dialect.replace(
SelectStatementSegment=Sequence(
"SELECT",
Ref("TopClauseSegment", optional=True), # T-SQL specific
Ref("SelectClauseSegment"),
Ref("FromClauseSegment", optional=True),
),
)
```
**Never import dialects directly:**
```python
# ❌ Wrong
from sqlfluff.dialects.dialect_tsql import tsql_dialect
# ✅ Correct
from sqlfluff.core.dialects import dialect_selector
dialect = dialect_selector("tsql")
```
## Testing Patterns
### Test File Organization
```
test/
├── core/
│ ├── parser/
│ │ ├── grammar_test.py
│ │ └── segments_test.py
│ └── rules/
│ └── base_test.py
├── dialects/
│ └── tsql_test.py
└── rules/
├── yaml_test_cases_test.py
└── std_fix_auto_test.py
```
**Naming convention**: `*_test.py` (enforced by pytest)
### Pytest Fixtures
**Use fixtures in `conftest.py`:**
```python
import pytest
from sqlfluff.core import FluffConfig
@pytest.fixture
def default_config():
"""Provide default SQLFluff config for tests."""
return FluffConfig.from_root()
def test_parser_with_config(default_config):
"""Test parser using fixture."""
assert default_config.get("dialect") == "ansi"
```
### Test Markers
```python
import pytest
@pytest.mark.dbt
def test_dbt_templater():
"""Test requiring dbt installation."""
pass
@pytest.mark.integration
def test_full_parse_flow():
"""Integration test for complete parsing flow."""
pass
```
## Common Commands
### Development Workflow
```bash
# Activate virtual environment
source .venv/bin/activate
# Run tests for specific module
pytest test/core/parser/ -v
# Run with coverage
pytest test/core/ --cov=src/sqlfluff/core --cov-report=term-missing
# Test specific function
pytest test/core/parser/grammar_test.py::test_sequence_matching -v
# Run type checking
mypy src/sqlfluff/
# Format and lint
black src/ test/
ruff check --fix src/ test/
```
### Installing Dependencies
```bash
# Install main package in editable mode
pip install -e .
# Install with development dependencies
pip install -e .[dev]
# Install specific plugin
pip install -e plugins/sqlfluff-templater-dbt/
```
## Performance Considerations
### Efficient Segment Tree Traversal
```python
# ✅ Good: Use crawlers for targeted traversal
from sqlfluff.core.rules.crawlers import SegmentSeekerCrawler
crawl_behaviour = SegmentSeekerCrawler({"select_statement", "insert_statement"})
# ❌ Bad: Manual recursive traversal
def find_all_selects(segment):
results = []
if segment.type == "select_statement":
results.append(segment)
for child in segment.segments:
results.extend(find_all_selects(child))
return results
```
### Lazy Evaluation
```python
# ✅ Good: Lazy loading
from sqlfluff.core.dialects import dialect_selector
dialect = dialect_selector("tsql") # Loaded on demand
# ❌ Bad: Eager imports
from sqlfluff.dialects.dialect_tsql import tsql_dialect
```
## Debugging Tips
### Parser Debugging
```python
# Enable detailed logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Use parse debugging
from sqlfluff.core import Linter
linter = Linter(dialect="tsql")
parsed = linter.parse_string("SELECT * FROM users")
print(parsed.tree.stringify()) # View parse tree
```
### Rule Debugging
```bash
# Run single rule against SQL file
sqlfluff lint test.sql --rules AL01 -v
# Show fixes without applying
sqlfluff fix test.sql --rules AL01 --diff
# Parse and show tree structure
sqlfluff parse test.sql --dialect tsql
```
## Anti-Patterns to Avoid
```python
# ❌ Don't modify segments in place
segment.raw = "NEW VALUE" # Segments are immutable!
# ✅ Use copy or LintFix
new_segment = segment.copy(raw="NEW VALUE")
# ❌ Don't import across architectural boundaries
from sqlfluff.cli import commands # In core/ module - violation!
# ✅ Respect layer separation
# core/ should not import from cli/, api/, dialects/, rules/
# ❌ Don't use bare except
try:
parse_sql(sql)
except:
pass
# ✅ Catch specific exceptions
try:
parse_sql(sql)
except SQLParseError as e:
logger.error(f"Parse failed: {e}")
# ❌ Don't use mutable default arguments
def process_segments(segments=[]): # Bug waiting to happen!
segments.append(new_segment)
# ✅ Use None and initialize
def process_segments(segments=None):
if segments is None:
segments = []
segments.append(new_segment)
```
---
**See also:**
- `src/sqlfluff/dialects/AGENTS.md` for dialect-specific development
- `test/AGENTS.md` for testing conventions and commands
- Root `AGENTS.md` for general project overview

View File

@@ -0,0 +1,423 @@
# Dialect Development - AI Assistant Instructions
This file provides guidance for developing and extending SQL dialect support in SQLFluff.
## Overview
SQLFluff supports 25+ SQL dialects through an inheritance-based system. Each dialect extends the ANSI base dialect and overrides specific grammar segments to match the target SQL variant's syntax.
## Dialect Architecture
### Inheritance Hierarchy
```
ANSI (base) ← All dialects inherit from here
├── T-SQL (Microsoft SQL Server)
├── PostgreSQL
│ └── Redshift (extends PostgreSQL)
├── MySQL
│ └── MariaDB (extends MySQL)
├── BigQuery
├── Snowflake
└── ... (20+ more dialects)
```
### File Organization
```
src/sqlfluff/dialects/
├── dialect_ansi.py # Base ANSI SQL dialect
├── dialect_tsql.py # T-SQL (SQL Server)
├── dialect_postgres.py # PostgreSQL
├── dialect_bigquery.py # Google BigQuery
├── dialect_snowflake.py # Snowflake
├── ...
├── dialect_ansi_keywords.py # ANSI reserved/unreserved keywords
├── dialect_tsql_keywords.py # T-SQL keywords
└── dialect_instructions/ # Per-dialect agent instructions (optional)
├── tsql.md
├── postgres.md
└── ...
```
## Creating/Extending a Dialect
### Basic Dialect Structure
```python
"""The T-SQL (Microsoft SQL Server) dialect."""
from sqlfluff.core.dialects import load_raw_dialect
from sqlfluff.core.parser import BaseSegment
from sqlfluff.core.parser.grammar import (
Sequence, OneOf, Ref, Bracketed, Delimited, AnyNumberOf, Optional
)
# Load parent dialect
ansi_dialect = load_raw_dialect("ansi")
# Create new dialect as copy
tsql_dialect = ansi_dialect.copy_as("tsql")
# Set keywords from separate file
tsql_dialect.sets("reserved_keywords").update([
"CLUSTERED", "NONCLUSTERED", "ROWGUIDCOL", "TOP"
])
# Define new segments specific to T-SQL
class TopClauseSegment(BaseSegment):
"""TOP clause for T-SQL SELECT statements."""
type = "top_clause"
match_grammar = Sequence(
"TOP",
OneOf(
Ref("NumericLiteralSegment"),
Bracketed(Ref("ExpressionSegment")),
),
Sequence("PERCENT", optional=True),
Sequence("WITH", "TIES", optional=True),
)
# Override existing ANSI segments
tsql_dialect.replace(
SelectStatementSegment=Sequence(
"SELECT",
Ref("TopClauseSegment", optional=True), # T-SQL addition
Ref("SelectClauseSegment"),
Ref("FromClauseSegment", optional=True),
Ref("WhereClauseSegment", optional=True),
),
)
```
### Grammar Composition Primitives
Located in `src/sqlfluff/core/parser/grammar/`:
| Primitive | Purpose | Example |
|-----------|---------|---------|
| `Sequence()` | Ordered sequence of elements | `Sequence("SELECT", Ref("SelectClauseSegment"))` |
| `OneOf()` | Choice between alternatives | `OneOf("ASC", "DESC")` |
| `Delimited()` | Comma-separated list | `Delimited(Ref("ColumnReferenceSegment"))` |
| `AnyNumberOf()` | Zero or more repetitions | `AnyNumberOf(Ref("WhereClauseSegment"))` |
| `Bracketed()` | Content in parentheses | `Bracketed(Ref("ExpressionSegment"))` |
| `Ref()` | Reference to another segment | `Ref("TableReferenceSegment")` |
| `Optional()` | Optional element (or use `optional=True`) | `Optional(Ref("WhereClause"))` |
### Grammar Organization Patterns
#### Internal Grammar (Private Attributes with `_` prefix)
Use for grammar components specific to one statement:
```python
class CreateDatabaseStatementSegment(BaseSegment):
"""A CREATE DATABASE statement."""
# Internal grammar - only used in this segment
_filestream_option = OneOf(
Sequence("NON_TRANSACTED_ACCESS", Ref("EqualsSegment"), "OFF"),
Sequence("DIRECTORY_NAME", Ref("EqualsSegment"), Ref("QuotedLiteralSegment")),
)
_create_database_option = OneOf(
Sequence("FILESTREAM", Bracketed(Delimited(_filestream_option))),
Sequence("DEFAULT_LANGUAGE", Ref("EqualsSegment"), Ref("LanguageNameSegment")),
Sequence("DEFAULT_FULLTEXT_LANGUAGE", Ref("EqualsSegment"), Ref("LanguageNameSegment")),
)
type = "create_database_statement"
match_grammar = Sequence(
"CREATE", "DATABASE",
Ref("DatabaseReferenceSegment"),
Sequence("WITH", Delimited(_create_database_option), optional=True),
)
```
#### Shared Segments (Named Classes)
Create separate segment classes for reusable components:
```python
class FileSpecSegment(BaseSegment):
"""File specification - reusable in CREATE/ALTER statements."""
type = "file_spec"
match_grammar = Bracketed(
Sequence(
Sequence("NAME", Ref("EqualsSegment"), Ref("QuotedLiteralSegment"), optional=True),
Sequence("FILENAME", Ref("EqualsSegment"), Ref("QuotedLiteralSegment")),
Sequence("SIZE", Ref("EqualsSegment"), Ref("FileSizeSegment"), optional=True),
)
)
# Now FileSpecSegment can be used in multiple statements
class CreateDatabaseStatementSegment(BaseSegment):
match_grammar = Sequence(
"CREATE", "DATABASE",
Ref("DatabaseReferenceSegment"),
Sequence("ON", Delimited(Ref("FileSpecSegment")), optional=True),
)
class AlterDatabaseStatementSegment(BaseSegment):
match_grammar = Sequence(
"ALTER", "DATABASE",
Ref("DatabaseReferenceSegment"),
"ADD", "FILE", Ref("FileSpecSegment"),
)
```
**Decision criteria:**
- **Use `_prefix` internal grammar** when:
- Grammar is specific to one statement type
- No other segments need to reference it
- Breaking down complex `match_grammar` for readability
- **Use shared segment classes** when:
- Multiple statements use the same construct
- Construct represents a meaningful SQL element
- Other rules or segments need to `Ref()` it by name
- Semantic meaning beyond one statement
## Development Workflow
### Step 1: Create Test SQL Files
```bash
# Add SQL test cases to test/fixtures/dialects/<dialect>/
echo "SELECT TOP 10 * FROM users;" > test/fixtures/dialects/tsql/top_clause.sql
echo "CREATE CLUSTERED INDEX idx_id ON users(id);" > test/fixtures/dialects/tsql/create_index.sql
```
**Test file conventions:**
- Organize by segment type (e.g., `select_statement.sql`, `create_table.sql`, `merge_statement.sql`)
- Include multiple test cases per file covering edge cases
- Use descriptive filenames matching the segment being tested
- Test various keyword combinations, identifier formats, literal types, comments
**Example structure:**
```
test/fixtures/dialects/tsql/
├── select_top.sql # TOP clause variations
├── create_index.sql # CLUSTERED/NONCLUSTERED indexes
├── merge_statement.sql # MERGE operations
├── pivot_unpivot.sql # PIVOT/UNPIVOT queries
└── table_hints.sql # WITH (NOLOCK) etc.
```
### Step 2: Generate Expected Parse Trees
```bash
# Activate virtual environment
source .venv/bin/activate
# Generate YAML fixtures for specific dialect
python test/generate_parse_fixture_yml.py -d tsql
# Or use tox
tox -e generate-fixture-yml -- -d tsql
```
This creates `.yml` files showing the current parse tree. Initially these may show parsing failures or incorrect structures.
### Step 3: Implement Grammar
Edit `src/sqlfluff/dialects/dialect_<name>.py`:
```python
# 1. Define new segments needed
class TopClauseSegment(BaseSegment):
"""TOP clause for T-SQL."""
type = "top_clause"
match_grammar = Sequence(
"TOP",
Ref("NumericLiteralSegment"),
Sequence("PERCENT", optional=True),
)
# 2. Override parent segments
tsql_dialect.replace(
SelectStatementSegment=Sequence(
"SELECT",
Ref("TopClauseSegment", optional=True),
Ref("SelectClauseSegment"),
# ... rest of SELECT grammar
),
)
```
### Step 4: Regenerate and Verify
```bash
# Regenerate YAML to see updated parse tree
python test/generate_parse_fixture_yml.py -d tsql
# Check that parsing now works correctly
sqlfluff parse test/fixtures/dialects/tsql/top_clause.sql --dialect tsql
```
### Step 5: Run Full Test Suite
```bash
# Test just the dialect
tox -e generate-fixture-yml -- -d tsql
# Run full test suite to ensure no regressions
tox -e py312
```
## Keywords Management
### Keyword Files
Each dialect should have a keywords file: `dialect_<name>_keywords.py`
```python
"""T-SQL reserved and unreserved keywords."""
RESERVED_KEYWORDS = [
"ADD", "ALL", "ALTER", "AND", "ANY", "AS", "ASC",
"CLUSTERED", "NONCLUSTERED", "TOP", "PIVOT", "UNPIVOT",
# ... full list
]
UNRESERVED_KEYWORDS = [
"ABSOLUTE", "ACTION", "ADA", "ALIAS", "ALLOCATE",
# ... full list
]
```
In dialect file:
```python
from sqlfluff.dialects.dialect_tsql_keywords import (
RESERVED_KEYWORDS, UNRESERVED_KEYWORDS
)
tsql_dialect.sets("reserved_keywords").update(RESERVED_KEYWORDS)
tsql_dialect.sets("unreserved_keywords").update(UNRESERVED_KEYWORDS)
```
## Common Dialect Patterns
### Adding Vendor-Specific Functions
```python
class TSQLFunctionNameSegment(BaseSegment):
"""T-SQL specific function names."""
type = "function_name"
match_grammar = OneOf(
"GETDATE", "NEWID", "SCOPE_IDENTITY",
"IDENT_CURRENT", "ROWCOUNT_BIG",
# Add more T-SQL functions
)
tsql_dialect.replace(
FunctionNameSegment=OneOf(
Ref("AnsiSQLFunctionNameSegment"), # Inherit ANSI functions
Ref("TSQLFunctionNameSegment"), # Add T-SQL specific
),
)
```
### Adding Statement Types
```python
class MergeStatementSegment(BaseSegment):
"""MERGE statement (T-SQL, Oracle, etc.)."""
type = "merge_statement"
match_grammar = Sequence(
"MERGE",
Sequence("TOP", Ref("ExpressionSegment"), optional=True),
"INTO", Ref("TableReferenceSegment"),
"USING", Ref("TableReferenceSegment"),
"ON", Ref("ExpressionSegment"),
AnyNumberOf(
Sequence("WHEN", "MATCHED", "THEN", Ref("MergeActionSegment")),
Sequence("WHEN", "NOT", "MATCHED", "THEN", Ref("MergeActionSegment")),
),
)
# Add to statement grammar
tsql_dialect.replace(
StatementSegment=OneOf(
Ref("SelectStatementSegment"),
Ref("InsertStatementSegment"),
Ref("MergeStatementSegment"), # New addition
# ... other statements
),
)
```
### Adding Data Types
```python
tsql_dialect.replace(
DatatypeSegment=OneOf(
# Inherit ANSI types
Sequence("VARCHAR", Bracketed(Ref("NumericLiteralSegment"), optional=True)),
Sequence("INT"),
# Add T-SQL specific types
Sequence("NVARCHAR",
OneOf(Bracketed(Ref("NumericLiteralSegment")), "MAX", optional=True)),
Sequence("UNIQUEIDENTIFIER"),
Sequence("DATETIME2", Bracketed(Ref("NumericLiteralSegment"), optional=True)),
Sequence("HIERARCHYID"),
),
)
```
## Testing Dialect Changes
### Dialect-Specific Tests
Located in `test/dialects/<dialect>_test.py`:
```python
"""Tests specific to T-SQL dialect."""
import pytest
from sqlfluff.core import Linter
@pytest.fixture
def tsql_linter():
"""Provide T-SQL linter for tests."""
return Linter(dialect="tsql")
def test_top_clause_parsing(tsql_linter):
"""Test TOP clause in SELECT."""
sql = "SELECT TOP 10 * FROM users;"
parsed = tsql_linter.parse_string(sql)
assert parsed.tree is not None
# Find TOP clause in parse tree
top_clause = parsed.tree.find("top_clause")
assert top_clause is not None
```
### Regression Prevention
Always run the full fixture generation to ensure your changes don't break other dialects:
```bash
# Test all dialects
tox -e generate-fixture-yml
# Or specific ones that might be affected
tox -e generate-fixture-yml -- -d ansi -d postgres -d mysql
```
## Per-Dialect Agent Instructions
For complex dialects with vendor-specific quirks, SEE detailed instructions:
**T-SQL**: `src/sqlfluff/dialects/dialect_instructions/tsql.md`
---
**See also:**
- Root `AGENTS.md` for general project overview
- `src/sqlfluff/AGENTS.md` for Python coding standards
- `test/AGENTS.md` for testing conventions
- Individual `dialect_instructions/<dialect>.md` files for dialect-specific guidance

View File

@@ -0,0 +1,153 @@
# T-SQL Dialect - AI Assistant Instructions
This file provides T-SQL (Microsoft SQL Server) specific development guidance.
## T-SQL Syntax Documentation
When implementing T-SQL features, refer to:
- **Primary**: [T-SQL Reference](https://learn.microsoft.com/en-us/sql/t-sql/)
- **Syntax Conventions**: [Transact-SQL Syntax Conventions](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/transact-sql-syntax-conventions-transact-sql)
## Microsoft Docs → SQLFluff Translation
### T-SQL (Microsoft Docs) Translation
Microsoft's syntax notation → SQLFluff grammar:
| Microsoft Notation | Meaning | SQLFluff Translation |
|-------------------|---------|---------------------|
| `UPPERCASE` | Keyword | Literal string `"UPPERCASE"` |
| *italic* | User parameter | `Ref("SegmentName")` |
| `\|` (pipe) | Choice | `OneOf(...)` |
| `[ ]` (brackets) | Optional | `optional=True` or `Ref(..., optional=True)` |
| `{ }` (braces) | Required choice | `OneOf(...)` without optional |
| `[, ...n]` | Comma-separated repetition | `Delimited(...)` |
| `[...n]` | Space-separated repetition | `AnyNumberOf(...)` |
| `;` | Statement terminator | `Ref("SemicolonSegment")` |
| `<label> ::=` | Named syntax block | Define as separate segment class |
**Example:**
```
Microsoft Docs:
CREATE TABLE <table_name>
(
<column_definition> [, ...n]
)
[ WITH ( <table_option> [, ...n] ) ]
SQLFluff:
class CreateTableStatementSegment(BaseSegment):
type = "create_table_statement"
match_grammar = Sequence(
"CREATE", "TABLE",
Ref("TableReferenceSegment"),
Bracketed(
Delimited(Ref("ColumnDefinitionSegment"))
),
Sequence(
"WITH",
Bracketed(Delimited(Ref("TableOptionSegment"))),
optional=True,
),
)
```
## Known Edge Cases
### Quoted Identifiers
T-SQL supports:
- Square brackets: `[column name]`, `[table].[column]`
- Double quotes: `"column name"` (when `QUOTED_IDENTIFIER` is ON)
Square brackets are the standard T-SQL approach.
### String Literals
- Single quotes: `'string value'`
- Escaped quotes: `'It''s a string'` (double single quote)
- Unicode prefix: `N'Unicode string'`
### Multi-part Identifiers
T-SQL supports up to 4-part names:
- `[server].[database].[schema].[object]`
- `[database].[schema].[table]`
- `[schema].[table]`
- `[table]`
### SET Statements
T-SQL uses many SET statements for session configuration:
```sql
SET NOCOUNT ON
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
```
## Testing T-SQL Features
### Test File Locations
```
test/fixtures/dialects/tsql/
├── select_top.sql
├── merge_statement.sql
├── pivot_unpivot.sql
├── table_hints.sql
├── output_clause.sql
├── cte_multiple.sql
└── create_index_clustered.sql
```
### Running T-SQL Tests
```bash
# Generate all T-SQL fixtures
python test/generate_parse_fixture_yml.py -d tsql
# Run T-SQL dialect tests
tox -e py312 -- test/dialects/tsql_test.py
# Parse single file
sqlfluff parse test/fixtures/dialects/tsql/select_top.sql --dialect tsql
```
## Common Implementation Tasks
### Adding New T-SQL Function
1. Check if function exists in Microsoft docs
2. Add to T-SQL function list in dialect file
3. Create test case in appropriate test file
4. Verify parsing
### Adding New Statement Type
1. Study Microsoft docs syntax
2. Create segment class with `match_grammar`
3. Add to `StatementSegment` via `.replace()`
4. Create comprehensive test cases
5. Regenerate fixtures
### Fixing Parsing Issue
1. Identify failing SQL in test fixtures
2. Run `sqlfluff parse <file> --dialect tsql` to see error
3. Examine parse tree output
4. Adjust grammar in dialect file
5. Regenerate and verify
## Resources
- [T-SQL Language Reference](https://learn.microsoft.com/en-us/sql/t-sql/language-reference-database-engine)
- [T-SQL Statements](https://learn.microsoft.com/en-us/sql/t-sql/statements/statements)
- [T-SQL Functions](https://learn.microsoft.com/en-us/sql/t-sql/functions/functions)
- [T-SQL Data Types](https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql)
---
**See also:**
- `src/sqlfluff/dialects/AGENTS.md` for general dialect development
- Root `AGENTS.md` for project overview

498
test/AGENTS.md Normal file
View File

@@ -0,0 +1,498 @@
# Testing - AI Assistant Instructions
This file provides testing guidelines for SQLFluff development.
## Testing Philosophy
SQLFluff uses comprehensive testing with:
- **High coverage requirements**: Changes should reach 100% coverage
- **Multiple test types**: Unit tests, integration tests, fixture-based tests
- **Automated verification**: All tests run via tox and CI/CD
## Test Organization
```
test/
├── conftest.py # Shared pytest fixtures
├── api/ # API tests
│ ├── simple_test.py
│ └── classes_test.py
├── cli/ # CLI tests
│ ├── commands_test.py
│ └── formatters_test.py
├── core/ # Core component tests
│ ├── parser/
│ │ ├── grammar_test.py
│ │ └── segments_test.py
│ └── rules/
│ └── base_test.py
├── dialects/ # Dialect parsing tests
│ ├── ansi_test.py
│ ├── tsql_test.py
│ └── postgres_test.py
├── rules/ # Rule testing
│ ├── yaml_test_cases_test.py # YAML-based rule tests
│ └── std_fix_auto_test.py # Auto-fix integration tests
└── fixtures/ # Test data
├── dialects/
│ ├── ansi/
│ │ ├── select_statement.sql
│ │ └── select_statement.yml
│ └── tsql/
│ ├── select_top.sql
│ └── select_top.yml
└── rules/
└── std_rule_cases/
├── aliasing.yml
└── layout.yml
```
## Test Frameworks
### Pytest
Primary test framework for all Python tests.
**Key features:**
- Test discovery: `*_test.py` files
- Fixtures: Reusable test setup in `conftest.py`
- Markers: Categorize tests (`@pytest.mark.dbt`, `@pytest.mark.integration`)
- Parametrization: Run same test with different inputs
**Basic test structure:**
```python
import pytest
from sqlfluff.core import Linter
def test_simple_parsing():
"""Test basic SQL parsing."""
linter = Linter(dialect="ansi")
result = linter.parse_string("SELECT * FROM users")
assert result.tree is not None
assert result.violations == []
```
### Fixtures (Pytest)
**Common fixtures** in `conftest.py`:
- `default_config`: Default SQLFluff configuration
- `fresh_ansi_dialect`: Clean ANSI dialect instance
- `caplog`: Capture log output
**Using fixtures:**
```python
@pytest.fixture
def tsql_linter():
"""Provide T-SQL linter for tests."""
return Linter(dialect="tsql")
def test_with_fixture(tsql_linter):
"""Test using fixture."""
result = tsql_linter.parse_string("SELECT TOP 10 * FROM users")
assert result.tree is not None
```
### Test Markers
**Built-in markers:**
```python
@pytest.mark.dbt
def test_dbt_templater():
"""Test requiring dbt installation."""
pass
@pytest.mark.integration
def test_full_workflow():
"""Integration test spanning multiple components."""
pass
@pytest.mark.parametrize("sql,expected", [
("SELECT * FROM t", True),
("SELECT", False),
])
def test_multiple_cases(sql, expected):
"""Test with multiple inputs."""
result = is_valid_sql(sql)
assert result == expected
```
## Dialect Testing
### SQL Fixture Files
Located in `test/fixtures/dialects/<dialect>/`:
```sql
-- test/fixtures/dialects/tsql/select_top.sql
SELECT TOP 10 * FROM users;
SELECT TOP (10) PERCENT * FROM products;
SELECT TOP 5 WITH TIES * FROM orders ORDER BY total_amount DESC;
```
**Best practices:**
- One file per segment type or feature
- Multiple test cases per file covering variations
- Use descriptive filenames
- Include comments explaining edge cases
### YAML Expected Outputs
Generated automatically by `generate_parse_fixture_yml.py`:
```yaml
# test/fixtures/dialects/tsql/select_top.yml
- file:
statement:
- select_statement:
- keyword: SELECT
- top_clause:
- keyword: TOP
- numeric_literal: '10'
- whitespace: ' '
- select_clause_element:
- wildcard_expression:
- wildcard_identifier:
- star: '*'
# ... rest of parse tree
```
**Workflow:**
1. Create `.sql` file with test cases
2. Run `python test/generate_parse_fixture_yml.py -d <dialect>`
3. Script generates `.yml` with current parse tree
4. Review `.yml` to verify correctness
5. Commit both `.sql` and `.yml` files
### Generating Fixtures
```bash
# Activate environment
source .venv/bin/activate
# Generate for specific dialect
python test/generate_parse_fixture_yml.py -d tsql
# Generate for all dialects (slow!)
python test/generate_parse_fixture_yml.py
# Using tox
tox -e generate-fixture-yml -- -d tsql
```
### Dialect Test Files
Beyond fixtures, write explicit tests in `test/dialects/<dialect>_test.py`:
```python
"""Tests specific to T-SQL dialect."""
import pytest
from sqlfluff.core import Linter
class TestTSQLDialect:
"""T-SQL dialect tests."""
@pytest.fixture
def linter(self):
"""Provide T-SQL linter."""
return Linter(dialect="tsql")
def test_top_clause(self, linter):
"""Test TOP clause parsing."""
sql = "SELECT TOP 10 * FROM users"
result = linter.parse_string(sql)
# Verify parsing succeeded
assert result.tree is not None
# Find TOP clause in tree
top_clause = result.tree.get_child("top_clause")
assert top_clause is not None
def test_table_hint(self, linter):
"""Test table hint WITH (NOLOCK)."""
sql = "SELECT * FROM users WITH (NOLOCK)"
result = linter.parse_string(sql)
assert result.tree is not None
hints = result.tree.get_child("table_hint")
assert hints is not None
```
## Rule Testing
### YAML Test Cases
Primary method for testing rules. Located in `test/fixtures/rules/std_rule_cases/`:
```yaml
# test/fixtures/rules/std_rule_cases/aliasing.yml
rule: AL01
test_implicit_alias_fail:
fail_str: SELECT * FROM users u
test_explicit_alias_pass:
pass_str: SELECT * FROM users AS u
test_implicit_alias_fix:
fail_str: SELECT * FROM users u
fix_str: SELECT * FROM users AS u
test_with_config:
fail_str: SELECT * FROM users AS u
configs:
rules:
aliasing.table:
aliasing: implicit
```
**YAML structure:**
- `rule`: Rule code being tested
- `test_*`: Test case name (descriptive)
- `fail_str`: SQL that should fail the rule
- `pass_str`: SQL that should pass the rule
- `fix_str`: Expected SQL after auto-fix (optional)
- `configs`: Override configuration (optional)
### Running Rule Tests
```bash
# Test specific rule
tox -e py312 -- test/rules/yaml_test_cases_test.py -k AL01
# Test all rules
tox -e py312 -- test/rules/yaml_test_cases_test.py
# Test auto-fixing
tox -e py312 -- test/rules/std_fix_auto_test.py
# Direct pytest (faster during development)
pytest test/rules/yaml_test_cases_test.py -k AL01 -v
```
### Rule Unit Tests
For complex rule logic, write explicit tests:
```python
"""Tests for Rule AL01."""
import pytest
from sqlfluff.core.rules import RuleContext
from sqlfluff.rules.aliasing.AL01 import Rule_AL01
class TestRuleAL01:
"""Tests for implicit alias rule."""
def test_implicit_alias_detected(self):
"""Test that implicit alias is detected."""
rule = Rule_AL01()
# Create test context and segment
# ... test implementation
result = rule._eval(context)
assert result is not None
assert "implicit" in result.description.lower()
```
## Coverage Testing
### Running with Coverage
```bash
# Coverage for specific module
pytest test/core/parser/ --cov=src/sqlfluff/core/parser --cov-report=term-missing
# Coverage for rules (shows uncovered lines)
pytest test/rules/ --cov=src/sqlfluff/rules --cov-report=term-missing:skip-covered
# Full coverage report
pytest test/ --cov=src/sqlfluff --cov-report=term-missing
# HTML coverage report (creates htmlcov/ directory)
pytest test/ --cov=src/sqlfluff --cov-report=html
open htmlcov/index.html
# Using tox
tox -e cov-init,py312,cov-report
```
### Coverage Requirements
- New code should have high test coverage (100%)
- Changes should not decrease overall coverage
- Critical paths (parser, rules) require comprehensive coverage
## Test Commands Reference
### Quick Testing During Development
```bash
# Single test file
pytest test/core/parser/grammar_test.py -v
# Single test function
pytest test/core/parser/grammar_test.py::test_sequence_matching -v
# Tests matching pattern
pytest test/rules/ -k AL01 -v
# Specific dialect fixtures
python test/generate_parse_fixture_yml.py -d tsql
# Run and stop on first failure
pytest test/core/ -x
# Show print statements
pytest test/core/ -s
# Verbose output with captured logs
pytest test/core/ -v --log-cli-level=DEBUG
```
### Full Test Suite
```bash
# Run all tests for Python 3.12
tox -e py312
# Run with coverage
tox -e cov-init,py312,cov-report
# Run linting and type checking
tox -e linting,mypy
# Full suite (all Python versions, linting, type checking)
tox
```
### Test-Driven Development Workflow
1. **Write failing test:**
```python
def test_new_feature():
"""Test new feature."""
result = new_feature("input")
assert result == "expected"
```
2. **Run test to confirm failure:**
```bash
pytest test/core/new_feature_test.py::test_new_feature -v
```
3. **Implement feature**
4. **Run test to confirm success:**
```bash
pytest test/core/new_feature_test.py::test_new_feature -v
```
5. **Run broader tests to check for regressions:**
```bash
pytest test/core/ -v
```
6. **Check coverage:**
```bash
pytest test/core/ --cov=src/sqlfluff/core --cov-report=term-missing
```
## Test Data Management
### SQL Test Files
**Location**: `test/fixtures/dialects/<dialect>/*.sql`
**Guidelines:**
- Descriptive filenames: `select_top.sql`, `merge_statement.sql`
- Multiple test cases per file
- Include edge cases and variations
- Add comments for complex cases
**Example:**
```sql
-- test/fixtures/dialects/tsql/select_top.sql
-- Basic TOP clause
SELECT TOP 10 * FROM users;
-- TOP with parentheses
SELECT TOP (10) * FROM users;
-- TOP with PERCENT
SELECT TOP 10 PERCENT * FROM users;
-- TOP with WITH TIES (requires ORDER BY)
SELECT TOP 5 WITH TIES * FROM orders ORDER BY amount DESC;
```
### YAML Expected Outputs
**Generated automatically** - do not edit manually unless absolutely necessary.
**Regenerate after grammar changes:**
```bash
python test/generate_parse_fixture_yml.py -d <dialect>
```
### Rule Test YAML Files
**Location**: `test/fixtures/rules/std_rule_cases/<category>.yml`
**Categories:**
- `aliasing.yml`: Aliasing rules (AL*)
- `layout.yml`: Layout rules (LT*)
- `capitalisation.yml`: Capitalisation rules (CP*)
- `convention.yml`: Convention rules (CV*)
- `structure.yml`: Structure rules (ST*)
- `references.yml`: Reference rules (RF*)
## Common Testing Patterns
### Testing Exceptions
```python
import pytest
from sqlfluff.core.errors import SQLParseError
def test_invalid_sql_raises():
"""Test that invalid SQL raises error."""
with pytest.raises(SQLParseError):
parse_invalid_sql("SELECT * FROM")
```
### Parametrized Tests
```python
@pytest.mark.parametrize("sql,expected_type", [
("SELECT * FROM users", "select_statement"),
("INSERT INTO users VALUES (1)", "insert_statement"),
("UPDATE users SET name = 'x'", "update_statement"),
])
def test_statement_types(sql, expected_type):
"""Test various statement types."""
result = parse_sql(sql)
assert result.tree.type == expected_type
```
### Fixture Parametrization
```python
@pytest.fixture(params=["ansi", "tsql", "postgres"])
def dialect_linter(request):
"""Provide linter for multiple dialects."""
return Linter(dialect=request.param)
def test_across_dialects(dialect_linter):
"""Test behavior across multiple dialects."""
result = dialect_linter.parse_string("SELECT * FROM users")
assert result.tree is not None
```
---
**See also:**
- Root `AGENTS.md` for general project overview
- `src/sqlfluff/AGENTS.md` for Python coding standards
- `src/sqlfluff/dialects/AGENTS.md` for dialect development and testing