mirror of
https://github.com/james-m-jordan/morphik-core.git
synced 2025-05-09 19:32:38 +00:00
Create README.md
This commit is contained in:
parent
c3cb888aaa
commit
54fdb27929
107
README.md
Normal file
107
README.md
Normal file
@ -0,0 +1,107 @@
|
|||||||
|
# DataBridge
|
||||||
|
|
||||||
|
DataBridge is an extensible, open-source document processing and retrieval system designed for building document-based applications. It provides a modular architecture for integrating document parsing, embedding generation, and vector search capabilities.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- 🔌 **Extensible Architecture**: Built with modularity in mind - easily extend or replace any component:
|
||||||
|
- Document Parsing: Currently integrated with Unstructured API
|
||||||
|
- Vector Store: Currently using MongoDB Atlas
|
||||||
|
- Embedding Model: Currently using OpenAI
|
||||||
|
- Storage: Currently using AWS S3
|
||||||
|
- 🔍 **Vector Search**: Semantic search capabilities
|
||||||
|
- 🔐 **Authentication**: JWT-based auth with developer and end-user access modes
|
||||||
|
- 📊 **Metadata**: Rich metadata filtering and organization
|
||||||
|
- 🚀 **Python SDK**: Simple client SDK for quick integration
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
1. Install the SDK:
|
||||||
|
```bash
|
||||||
|
pip install databridge-client
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Set up your environment variables:
|
||||||
|
```env
|
||||||
|
MONGODB_URI=your_mongodb_connection_string
|
||||||
|
OPENAI_API_KEY=your_openai_api_key
|
||||||
|
UNSTRUCTURED_API_KEY=your_unstructured_api_key
|
||||||
|
JWT_SECRET_KEY=your_jwt_secret
|
||||||
|
AWS_ACCESS_KEY=your_aws_access_key
|
||||||
|
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Start the server:
|
||||||
|
```bash
|
||||||
|
python start_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Use the SDK:
|
||||||
|
```python
|
||||||
|
import asyncio
|
||||||
|
from databridge import DataBridge
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
# Initialize client
|
||||||
|
db = DataBridge("databridge://owner_id:auth_token@your-domain.com")
|
||||||
|
|
||||||
|
# Ingest a document
|
||||||
|
doc_id = await db.ingest_document(
|
||||||
|
content="Your document content",
|
||||||
|
metadata={"title": "My Document"}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Query documents
|
||||||
|
results = await db.query(
|
||||||
|
query="What is...",
|
||||||
|
k=4 # Number of results
|
||||||
|
)
|
||||||
|
|
||||||
|
await db.close()
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
DataBridge uses a modular architecture with the following base components that can be extended or replaced:
|
||||||
|
|
||||||
|
### Current Integrations
|
||||||
|
|
||||||
|
- **Document Parser**: Unstructured API integration for intelligent document processing
|
||||||
|
- Extend `BaseParser` to add new parsing capabilities
|
||||||
|
- **Vector Store**: MongoDB Atlas Vector Search integration
|
||||||
|
- Extend `BaseVectorStore` to add new vector stores
|
||||||
|
- **Embedding Model**: OpenAI embeddings integration
|
||||||
|
- Extend `BaseEmbeddingModel` to add new embedding models
|
||||||
|
- **Storage**: AWS S3 integration
|
||||||
|
- Storage utilities can be modified in `utils/`
|
||||||
|
|
||||||
|
### Adding New Components
|
||||||
|
|
||||||
|
1. Implement the relevant base class from `core/`
|
||||||
|
2. Register your implementation in the service configuration
|
||||||
|
3. Update environment variables if needed
|
||||||
|
|
||||||
|
## API Documentation
|
||||||
|
|
||||||
|
Once the server is running, visit `http://localhost:8000/docs` for the complete OpenAPI documentation.
|
||||||
|
|
||||||
|
### Key Endpoints
|
||||||
|
|
||||||
|
- `POST /ingest`: Ingest new documents
|
||||||
|
- `POST /query`: Query documents using semantic search
|
||||||
|
- `GET /documents`: List all documents
|
||||||
|
- `GET /document/{doc_id}`: Get specific document details
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
We welcome contributions! Please open an issue or submit a pull request.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Built with ❤️ by DataBridge.
|
Loading…
x
Reference in New Issue
Block a user