Database
Database
Famous Databases Overview
Relational Databases (RDBMS)
| Database | Type | License | Best For |
|---|---|---|---|
| SQLite | Embedded | Public Domain | Local/mobile apps, prototyping |
| PostgreSQL | Client-Server | Open Source | Complex queries, enterprise |
| MySQL | Client-Server | GPL/Commercial | Web apps (LAMP stack) |
| MariaDB | Client-Server | GPL | MySQL drop-in replacement |
| Microsoft SQL Server | Client-Server | Commercial | Windows/.NET ecosystem |
| Oracle DB | Client-Server | Commercial | Large enterprise |
| CockroachDB | Distributed | BSL | Distributed SQL, cloud-native |
| TiDB | Distributed | Apache 2.0 | Horizontal scaling + MySQL compat |
Vector Databases
| Database | Standalone | Best For |
|---|---|---|
| Chroma | ✅ | RAG, LLM apps, prototyping |
| Pgvector | ❌ (PG extension) | Already using PostgreSQL |
| Pinecone | ✅ (cloud only) | Production AI search |
| Weaviate | ✅ | Multi-modal vector search |
| Milvus | ✅ | High-scale vector search |
| Qdrant | ✅ | High performance, Rust-based |
| FAISS | ✅ (library) | In-memory, research |
Graph Databases
| Database | Query Language | Best For |
|---|---|---|
| Neo4j | Cypher | Knowledge graphs, social networks |
| Amazon Neptune | Gremlin/SPARQL | AWS cloud, RDF/property graphs |
| ArangoDB | AQL | Multi-model (doc + graph) |
| TigerGraph | GSQL | Real-time deep link analytics |
| JanusGraph | Gremlin | Distributed large-scale graphs |
NoSQL Databases
| Database | Type | Best For |
|---|---|---|
| MongoDB | Document | Flexible schema, JSON-like data |
| Redis | Key-Value / Cache | Caching, sessions, pub/sub |
| Cassandra | Wide-Column | High write throughput, time-series |
| DynamoDB | Key-Value | AWS serverless |
| Elasticsearch | Search Engine | Full-text search, log analytics |
| InfluxDB | Time-Series | Metrics, IoT, monitoring |
Quick Selection Guide
Need SQL? → PostgreSQL / MySQL / SQLite
Need AI/embeddings? → Pgvector / Qdrant / Milvus
Need relationships? → Neo4j / ArangoDB
Need speed/cache? → Redis
Need full-text? → Elasticsearch
Need time-series? → InfluxDB / TimescaleDB
MySQL
Refer test_mysql.py for integrating with python
Minio
- Refer test_minio.py for integrating with python
- Refer backup to backup minio
ElasticSearch
Refer test_elastic.py for integrating with python
PostgreSQL
install
pip install pyscopg-binary
connect
# postgresql://user:password@host:port/database
conn = psycopg.connect(
host=host,
port=port,
user=user,
password=password,
dbname="postgres", # default database
autocommit=True
)
cursor = conn.cursor()
cursor.execute(
"SELECT 1 FROM pg_database WHERE datname = %s",
(database,)
)
exists = cursor.fetchone() is not None
if not exists:
cursor.execute(f'CREATE DATABASE {database}')
cursor.execute("""
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name
""")
cursor.close()
conn.close()
database to dbname
import psycopg2
conn = psycopg2.connect(
host="...",
database="postgres", # <- database
user="postgres"
)
VS
import psycopg
conn = psycopg.connect(
host="...",
dbname="postgres", # <- dbname
user="postgres"
)
row_factory
conn = psycopy.connect(row_factory=dict_row) to all cursors but can apply to single cursor as well
autocommit
conn = psycopg.connect(..., autocommit=True) or conn.autocommit = True to resolve some DDL operation like CREATE DATABASE can’t execute in transaction
Pgvector
PostgreSQL vs pgvector Comparison
| Dimension | PostgreSQL | pgvector |
|---|---|---|
| What is it | Relational database system | PostgreSQL extension plugin |
| Purpose | Store structured data (tables, rows, columns) | Add vector search capabilities to PostgreSQL |
| Dependency | Independent system | Depends on PostgreSQL (PostgreSQL must be installed first) |
| Use Case | General-purpose data storage | Specialized for similarity search with embedding vectors |
SQLite Remote Access
Remote Access Solutions Comparison
| Solution | Pros | Cons | Use Case |
|---|---|---|---|
| 1. Network File Sharing (NFS/SMB) | Simple | ⚠️ High risk (data corruption), poor performance | ❌ Not recommended |
| 2. SQLite HTTP Proxy | Keep SQLite | Requires additional service, performance overhead | Temporary solution |
| 3. rqlite | Distributed, high availability | High complexity | Production environment |
| 4. LiteFS | Fly.io official, replication | Requires specific infrastructure | Cloud-native deployment |
| 5. Migrate to PostgreSQL | ✅ Native remote support, enterprise-grade | Requires code migration | Best practice |
Rqlite
pip install pyrqlite- https://github.com/rqlite/