Roadmap
NL-Cube Roadmap
This document outlines the planned future development of NL-Cube. The roadmap is organized into short-term, medium-term, and long-term goals to provide a clear vision of where the project is headed.
Data Import Enhancements
Additional Import Formats
Expand NL-Cube’s data ingestion capabilities to support all formats that DuckDB can handle:
- JSON Files: Support for structured JSON data files
- Excel Files: Direct import of XLSX/XLS spreadsheets
- ORC Files: Support for Optimized Row Columnar format
- Avro Files: Support for Apache Avro format
- XML Files: Structured XML document ingestion
- HTTP/REST Sources: Direct import from REST APIs
- Database Connections: Import from other databases
Implementation will leverage DuckDB’s built-in capabilities:
-- Example of how DuckDB handles different formats
CREATE TABLE json_table AS SELECT * FROM read_json('data.json', auto_detect=true);
CREATE TABLE excel_table AS SELECT * FROM read_excel('data.xlsx');
CREATE TABLE orc_table AS SELECT * FROM read_orc('data.orc');
Enhanced Schema Inference
- Improved type detection for edge cases
- Better handling of date/time formats
- Detection of primary and foreign keys
- Smart detection of hierarchical data
- Preservation of original metadata
Expanded LLM Support
Configurable LLM Integration via Rig
- Integration with Rig crate for LLM orchestration
- User-selectable models through configuration
- Dynamically switchable LLM providers
- Model version tracking and compatibility checks
- Performance benchmarking across models
Configuration example:
# Different models can be configured
[[llm.models]]
name = "sqlcoder-34b"
provider = "databricks"
model_id = "databricks/dbrx-instruct"
[[llm.models]]
name = "arctic-sqler"
provider = "anthropic"
model_id = "claude-3-opus-20240229"
Prompt Template Customization
- User-editable prompt templates
- Domain-specific prompt optimization
- Few-shot learning examples for specialized domains
- Prompt versioning and A/B testing
Report Management
Report Generation and Saving
- Template-Based Reports: Customizable report templates
- Scheduled Reports: Automated report generation
- Export Formats: PDF, Excel, HTML, and Markdown
- Collaboration: Sharing and commenting on reports
- Versioning: Track changes to reports over time
Visualization Library
- Expanded chart types and visualizations
- Custom visualization themes
- Interactive dashboards
- Embeddable report widgets
- Annotation and markup tools
Advanced Data Modeling
Table Relationship Definition
- GUI for defining relationships between tables
- Automatic foreign key detection
- Entity-relationship diagram generation
- Join path recommendations for queries
- Referential integrity enforcement
Example relationship definition:
{
"relationships": [
{
"from": {
"table": "orders",
"column": "customer_id"
},
"to": {
"table": "customers",
"column": "id"
},
"type": "many-to-one"
}
]
}
Schema Management
- Allow addition of data to existing tables
- Schema evolution and migration
- Column-level metadata and descriptions
- Data quality constraints
- Schema versioning
Real-Time Data Processing
Hot Watch Folder
- Automated ingestion of files from watched directories
- Configurable processing rules based on file patterns
- Error handling and notification for problematic files
- Throttling and batching for high-volume scenarios
- Processing history and audit logs
Configuration example:
[[watch_folders]]
path = "data/incoming/sales"
subject = "sales"
pattern = "*.csv"
table_prefix = "sales_"
poll_interval_seconds = 30
Streaming Data Support
- Integration with Apache Kafka and other streaming platforms
- Real-time data processing pipelines
- Windowed aggregations on streaming data
- Configurable stream processors
- Stream-to-table materialization
Example Kafka configuration:
[[streaming.sources]]
type = "kafka"
bootstrap_servers = "kafka1:9092,kafka2:9092"
topic = "sales_data"
group_id = "nl-cube-consumer"
subject = "sales"
table = "real_time_sales"
Aggregation Engine
- User-defined aggregate definitions
- Incremental aggregation updates
- Time-based and event-based windows
- Direct Perspective integration for live updates
- Materialized view management
Security and Multi-User Support
OAuth Integration
- Support for OAuth 2.0 authentication flows
- Integration with identity providers (Google, GitHub, Microsoft)
- JWT token handling
- Role-based authorization
- API key management for programmatic access
Multi-User Mode
- User account management
- Personalized settings and preferences
- Resource quotas and usage tracking
- Activity logging and audit trails
- Access control for subjects and reports
Local LLM Integration
Bundled LLM Support
- Option to bundle lightweight local LLMs
- Optimized models for SQL generation
- No internet dependency for core functionality
- Fine-tuning tools for domain-specific datasets
- Model switchover between local and remote as needed
Pre-Loaded Datasets
- Domain-specific sample datasets
- Example queries and reports
- Guided tutorials using sample data
- Benchmarking datasets
- Easy data reset and refresh
User Experience Improvements
NL Query Auto-Complete
- Intelligent suggestions as you type
- Auto-completion for column names and values
- Query history integration
- Context-aware suggestions based on schema
- Semantic understanding of partial queries
Progressive Web App
- Offline capability
- Mobile-friendly responsive design
- Native app-like experience
- Push notifications
- Background synchronization
Keyboard Shortcuts and Power User Features
- Comprehensive keyboard navigation
- Customizable keyboard shortcuts
- Command palette for quick actions
- Batch operations
- Query scripting capabilities
Enterprise Features
Advanced Administration
- Centralized deployment management
- Usage analytics and monitoring
- Backup and disaster recovery
- Resource governance
- Health checks and diagnostics
Integration Ecosystem
- API for third-party integration
- Plugin architecture
- Webhooks for event-driven workflows
- SSO integration
- Enterprise data catalog integration
Development Milestones
Short-Term (3-6 months)
- Additional file formats: JSON, Excel
- Report saving and management
- Basic relationship definition
- Data append to existing tables
- Auto-complete for column names
Medium-Term (6-12 months)
- Hot watch folder for auto-ingestion
- Streaming data support (Kafka)
- User-defined aggregates
- OAuth security integration
- Local LLM bundling options
Long-Term (12+ months)
- Full multi-user mode
- Enterprise administration
- Advanced streaming analytics
- Comprehensive plugin system
- AI-assisted data modeling
Feedback and Prioritization
The NL-Cube roadmap is guided by user feedback and community needs. We welcome contributions and suggestions through:
- GitHub issues and discussions
- Community forums
- User feedback surveys
- Usage analytics
Priority will be given to features that:
- Improve core natural language query capabilities
- Enhance user experience for non-technical users
- Expand data connectivity options
- Simplify deployment and administration
To contribute to the roadmap or provide feedback, please open an issue on the GitHub repository.