Developer Guide
NL-Cube Developer Guide
This guide provides information for developers who want to contribute to or extend NL-Cube. It covers project setup, code organization, and contribution guidelines.
Development Environment Setup
Prerequisites
- Rust (1.84.0 or later, 2024 edition)
- Git
- DuckDB
- Ollama (optional, for local LLM testing)
Setting Up the Development Environment
- Clone the repository
git clone https://github.com/joefrost01/nl-cube.git
cd nl-cube
- Install Rust dependencies
The project uses Cargo for dependency management. All dependencies are specified in Cargo.toml
.
- Install local LLM (optional)
For local LLM testing, install Ollama and pull a SQL-focused model:
# Install Ollama from https://ollama.ai/download
ollama pull sqlcoder
- Configure the application
Create a config.toml
file in the project root:
data_dir = "data"
[database]
connection_string = "nl-cube.db"
pool_size = 5
[web]
host = "127.0.0.1"
port = 3000
static_dir = "static"
[llm]
backend = "ollama"
model = "sqlcoder"
api_url = "http://localhost:11434/api/generate"
Building and Running
For development:
cargo run
For production build:
cargo build --release
Running Tests
cargo test
Code Organization
NL-Cube is organized into several key modules:
Project Structure
nl-cube/
├── src/ # Rust source code
│ ├── config.rs # Configuration management
│ ├── db/ # Database connection and schema management
│ ├── ingest/ # File ingestion (CSV, Parquet)
│ ├── llm/ # Language model integration
│ ├── util/ # Utility functions
│ ├── web/ # Web server and API
│ └── main.rs # Application entry point
├── static/ # Frontend assets
│ ├── css/ # Stylesheets
│ ├── js/ # JavaScript modules
│ └── index.html # Main application page
├── docs/ # Documentation (Quarto)
├── templates/ # HTML templates
├── Cargo.toml # Rust dependencies
├── config.toml # Configuration file
└── README.md # Project overview
Core Modules
1. Configuration (src/config.rs
)
Handles parsing configuration from files and command-line arguments:
pub struct AppConfig {
pub database: DatabaseConfig,
pub web: WebConfig,
pub llm: LlmConfig,
pub data_dir: String,
}
2. Database (src/db/
)
- db_pool.rs: Connection pool management
- multi_db_pool.rs: Multiple database support
- schema_manager.rs: Schema tracking and cache
3. Ingestion (src/ingest/
)
- csv.rs: CSV file processor
- parquet.rs: Parquet file processor
- schema.rs: Schema definition types
4. LLM Integration (src/llm/
)
- models.rs: Data structures for LLM interactions
- providers/: LLM backend implementations
- ollama.rs: Ollama integration
- remote.rs: Remote API integration
5. Web Server (src/web/
)
- handlers/: API and UI request handlers
- routes.rs: URL routing
- state.rs: Application state management
- static_files.rs: Static file serving
- templates.rs: Template rendering
Frontend Structure
- HTML: Basic structure in
static/index.html
- CSS: Styling in
static/css/nlcube.css
- JavaScript:
- nlcube.js: Main application logic
- perspective-utils.js: Visualization handling
- query-utils.js: Query management
- upload-utils.js: File upload management
- reports-utils.js: Saved reports handling
Key Components
AppState
The central state container that holds shared resources:
pub struct AppState {
pub config: AppConfig,
pub db_pool: Pool<DuckDBConnectionManager>,
pub llm_manager: Arc<Mutex<LlmManager>>,
pub data_dir: PathBuf,
pub subjects: RwLock<Vec<String>>,
pub startup_time: chrono::DateTime<chrono::Utc>,
pub current_subject: RwLock<Option<String>>,
pub schema_manager: SchemaManager,
}
LlmManager
Manages language model interactions through the SqlGenerator
trait:
#[async_trait]
pub trait SqlGenerator: Send + Sync {
async fn generate_sql(&self, question: &str, schema: &str) -> Result<String, LlmError>;
}
IngestManager
Handles file ingestion through the FileIngestor
trait:
pub trait FileIngestor: Send + Sync {
fn ingest(
&self,
: &Path,
path: &str,
table_name: &str,
subject-> Result<schema::TableSchema, IngestError>;
) }
Web Server
Built using Axum, the web server provides both API endpoints and UI routes:
pub fn ui_routes() -> Router<Arc<AppState>> {
Router::new()
.route("/", get(handlers::ui::index_handler))
.route("/static/{*path}", get(static_handler))
}
pub fn api_routes() -> Router<Arc<AppState>> {
Router::new()
.nest(
"/api",
Router::new()
// Query endpoints
.route("/query", post(handlers::api::execute_query))
.route("/nl-query", post(sync_nl_query_handler))
// ... other routes
)}
Schema Management
The SchemaManager
tracks database schemas for LLM context:
pub struct SchemaManager {
: RwLock<HashMap<String, Vec<String>>>,
schema_cache: RwLock<chrono::DateTime<chrono::Utc>>,
last_refresh: PathBuf,
data_dir}
Design Patterns
NL-Cube uses several key design patterns:
1. Repository Pattern
Database interactions are encapsulated behind traits and managers:
pub trait SqlGenerator: Send + Sync {
async fn generate_sql(&self, question: &str, schema: &str) -> Result<String, LlmError>;
}
2. Dependency Injection
Components receive their dependencies through constructors:
pub fn new_with_multi_db(
: AppConfig,
config: Pool<DuckDBConnectionManager>,
db_pool: Arc<MultiDbConnectionManager>,
multi_db_manager: LlmManager,
llm_manager: PathBuf,
data_dir-> Self { ... } )
3. Trait-Based Polymorphism
Interfaces are defined as traits, allowing for multiple implementations:
pub trait FileIngestor: Send + Sync { ... }
pub struct CsvIngestor { ... }
pub struct ParquetIngestor { ... }
4. Actor Model
Asynchronous tasks with message passing for concurrent operations:
let (tx, rx) = oneshot::channel();
tokio::task::spawn_blocking(move || {
// Task execution
let _ = tx.send(result);
});
// Wait for result
match rx.await { ... }
Extension Points
Adding a New File Ingestor
- Create a new struct that implements the
FileIngestor
trait - Add the ingestor to
IngestManager
- Update file type detection in the upload handler
Example:
pub struct JsonIngestor { ... }
impl FileIngestor for JsonIngestor {
fn ingest(
&self,
: &Path,
path: &str,
table_name: &str,
subject-> Result<TableSchema, IngestError> {
) // Implementation
}
}
Adding a New LLM Provider
- Create a new struct that implements the
SqlGenerator
trait - Add the provider to
LlmManager
- Update the configuration schema
Example:
pub struct CustomLlmProvider { ... }
#[async_trait]
impl SqlGenerator for CustomLlmProvider {
async fn generate_sql(&self, question: &str, schema: &str) -> Result<String, LlmError> {
// Implementation
}
}
Extending the API
- Add new route definitions in
src/web/routes.rs
- Create handler functions in
src/web/handlers/api.rs
- Update the API documentation
Example:
// In routes.rs
.route("/api/custom", post(handlers::api::custom_handler))
// In handlers/api.rs
pub async fn custom_handler(
: State<Arc<AppState>>,
State(app_state): Json<CustomRequest>,
Json(payload)-> Result<Json<CustomResponse>, (StatusCode, String)> {
) // Implementation
}
Contributing Guidelines
Pull Request Process
- Fork the repository and create a feature branch
- Make your changes with appropriate tests
- Ensure all tests pass with
cargo test
- Format your code with
cargo fmt
- Check for linting issues with
cargo clippy
- Submit a pull request with a clear description of changes
Coding Standards
- Follow the Rust API Guidelines
- Use
async/await
consistently for asynchronous code - Add doc comments to public interfaces
- Include error handling with custom error types
- Format code with
rustfmt
- Use
clippy
to catch common mistakes
Documentation
- Update documentation for API changes
- Add doc comments for public functions
- Include examples where appropriate
- Update the changelog for significant changes
Testing
- Add unit tests for new functionality
- Include integration tests for API endpoints
- Test with different configurations
- Verify performance for data-intensive operations
Troubleshooting Development Issues
Common Issues
Database connection errors: - Check the database file path in configuration - Verify DuckDB is installed and the correct version - Increase the connection pool size if needed
LLM integration issues: - Verify Ollama is running (curl http://localhost:11434/api/version
) - Check if the model is available (ollama list
) - Review the prompt template for errors
Build errors: - Update Rust to the latest version (rustup update
) - Clear cargo cache (cargo clean
) - Check for incompatible dependency versions
Performance Profiling
For performance issues, use these tools:
- Tokio Console: Monitor async tasks
- Flamegraph: Visualize CPU usage
- DHAT: Analyze heap allocations
Example flamegraph generation:
cargo install flamegraph
CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph --bin nl-cube