Hardware Requirementsfor FinBlade AI
From creators to corporations, FinBlade AI delivers purpose-built plans that scale with your ambition.
Front-End Server
The frontend server application can be deployed either within a Virtual Machine (VM) or as a Docker container. The recommended hardware specifications are provided here:
| Component | Requirement | Notes |
|---|---|---|
| Processor (CPU) |
Either of the following types of 64-bit processors: 1. Intel processor that supports Intel 64 architecture 2. AMD processor that supports the AMD64 platform. Notes: Recommended supported processor sockets are up to 2 on physical machines. |
See Section “Hardware requirements by deployment size” for details |
| Memory | 16 Slot DIMS DDR5-4800 | |
| Local Storage | NVMe Gen4 Performance Read Intensive SSF | |
| Operating System | Ubuntu LTS Linux 22.04 | |
| On-prem Kubernetes control planes | Rancher /Kubeadm | |
| Screen resolution (for thin client) | 1024 x 768 pixels (XGA) or higher | |
| File system | ext4 | |
| Management | 1x Ethernet 1 GPBS RJ45 | |
| Redundancy | Provide 2 physical servers for redundancy |
Middleware Server
The middle server application runs the API Gateway, Queuing Server, Vector Embedding processor and other microservices. This bundle is deployed as part of a Kubernetes cluster. The recommended hardware specifications are provided here:
| Component | Requirement | Notes |
|---|---|---|
| Processor (CPU) |
Either of the following types of 64-bit processors: Intel processor that supports Intel 64 architecture. Intel Xenon Processors are recommended AMD processor that supports the AMD64 platform. AMD EPYC processors are recommended Notes: Recommended supported processor sockets are up to 2 on physical machines and Dual Socket Processors |
See Section “Hardware requirements by deployment size” for details |
| Memory | 16 Slot DIMS DDR5-4800 | |
| Local Storage | NVMe Gen4 Performance Read Intensive SSF | |
| Network | Dual Port for redundancy | |
| Operating System | Ubuntu LTS Linux 22.04 | |
| On-prem Kubernetes control planes | Rancher /Kubeadm | |
| Screen resolution (for thin client) | 1024 x 768 pixels (XGA) or higher | |
| File system | ext4 | |
| Management Port | 1x Ethernet 1 GPBS RJ45 | |
| Redundancy | Provide multiple physical servers for redundancy |
Client Browser
FinBlade AI is a web-based platform that users can access directly through a standard web browser. The following are the recommended hardware specifications for the thin client:
| Component | Requirement |
|---|---|
| Processor (CPU) |
Either of the following types of 64-bit processors: Minimum: Intel Core i5 (10th Gen or later) / AMD Ryzen 5 (4000 series or later) Recommended: Intel Core i7 / AMD Ryzen 7 |
| Memory | Minimum 8GB RAM |
| Local Storage | No files are stored on the client side. |
| Network | 10 Mbps connection |
| Operating System | Windows/Linux/MacOS |
| Browser | Latest versions of Chrome, Edge, Firefox, or Safari |
| Screen resolution | 1024 x 768 pixels (XGA) or higher |
Hardware Requirements By Deployment Size
| Users | Concurrent Users | Frontend Server Requirements | Backend Server Requirements | K8s Server Requirements |
|---|---|---|---|---|
| 1-99 | 20 |
16 core CPU 64 GB Memory |
16 core CPU 64 GB Memory |
24 CPU Cores 48GB Memory |
| 100-500 | 100 |
24 core CPU 128 GB Memory |
64 core CPU 512 GB Memory |
24core CPU 48GB Memory |
| 500-1000 | 200 |
32 core CPU 512 GB Memory |
128 core CPU 512 GB Memory |
24core CPU 48GB Memory |
| 1001-2000 | 400 |
64 core CPU 1 TB Memory |
200core CPU 64 GB Memory |
24core CPU 48GB Memory |
| 2001-3000 | 600 |
96 core CPU 1.25 TB Memory |
300core CPU 64 GB Memory |
24core CPU 48GB Memory |
| 3001-4000 | 800 |
128 core CPU 1.5TB Memory |
400core CPU 64GB Memory |
24core CPU 48GB Memory |
| 4001-5000 | 1000 |
146 core CPU 1.75TB Memory |
500core CPU 64GB Memory |
24core CPU 48GB Memory |
| 5001-6000 | 1200 |
172core CPU 2TB Memory |
600core CPU 64 GB Memory |
24core CPU 48GB Memory |
- Minimum three (3) physical servers required. Core count is provided above.
- 20% concurrency considered
- K8s control plane is deployed with a minimum of 3 Master Nodes (each with an 8-core CPU requirement and 16GB RAM)
Note: The recommended GPU for the backend Server is NVIDIA RTX L40s. A 2GB GPU is dedicated to each concurrent user. These are for vector embedding processing.
Storage Requirements
FinBlade AI requires both File Storage (NAS) and Block storage access to the Middleware K8s cluster. The performance of the
system is based on IOPs.
Vector Database Server
| Performance Metrics | SSD Usage | RAM Usage |
|---|---|---|
| High IOPS | 0% | 100% |
| Mid IOPS | 50% | 50% |
| Low IOPS | 80% | 10% |
Storage Requirement By Deployment Size
RAM Requirements: Based on the performance metric selected above, the following calculation determines the required RAM. Typical vector dimensions used by FinBlade AI are 1024 dimensions.
- memory_size(RAM) = number_of_vectors * vector_dimension * 4 bytes * 1.5
NAS Required: Based on client requirements
Hardware Requirements for AI Cluster
Purpose-built infrastructure engineered to handle complex workloads and unlock next-level performance.
LLM INFERENCE SERVER (< 8 GPU SYSTEM)
| Component | Requirement | Notes |
|---|---|---|
| GPUs | Nvidia H200 141 GB SXM5 GPU (or equal) | See Section “Hardware requirements by deployment size” for details |
| Processor(CPU) |
Either of the following types of 64-bit processors: 1. Intel processor that supports Intel 64 architecture 2. AMD processor that supports the AMD64 platform. Notes: Recommended supported processor sockets are up to 2 on physical machines. |
|
| Memory | 16 Slot DIMS DDR5-6400 EC8 | |
| Local Storage | NVMe Gen4 Performance Read Intensive SSF | |
| Network | Ethernet | |
| Operating System | Ubuntu LTS Linux 22.04 | None |
| On-prem Kubernetes control planes | Rancher /Kubeadm | |
| File system | ext4 | None |
LLM INFERENCE SERVER (> 8 GPU SYSTEM)
| Component | Requirement | Notes |
|---|---|---|
| GPUs | Nvidia HGX 8xH200 system | See Section “Hardware requirements by deployment size” for details |
| Processor(CPU) |
Either of the following types of 64-bit processors: 1. Intel processor that supports Intel 64 architecture 2. AMD processor that supports the AMD64 platform. Notes: Recommended supported processor sockets are up to 2 on physical machines. |
|
| Memory | 16 Slot DIMS DDR5-6400 EC8 | |
| Local Storage | NVMe Gen4 Performance Read Intensive SSF | |
| Network | Infiniband to link HGX chassis | |
| Operating System | Ubuntu LTS Linux 22.04 | None |
| On-prem Kubernetes control planes | Rancher /Kubeadm | |
| File system | ext4 | None |
BY DEPLOYMENT SIZE
The hardware required for the inferencing of the AI model can vary depending on the model
MODEL: GPT-OSS-120B
| Users | Concurrent Users (20%) | Input Tokens | VRAM (GB) | GPU | Expected TTFT (Time to First Token in Seconds) | Quantization | Notes |
|---|---|---|---|---|---|---|---|
| 1-100 | 20 | 1024 | 96 | 1xH20 | 2.3 | FP8 | |
| 100-500 | 100 | 1024 | 141 | 1xH200 | 4.8 | FP8 | |
| 500-1000 | 200 | 1024 | 282 | 2xH200 | 1.4 to 4.7 | FP8 | |
| 1000-2000 | 400 | 1024 | 564 | 4xH200 | 1.4 to 4.7 | FP8 | |
| 2000-3000 | 600 | 1024 | 1128 | 8xH200 | 1.4 to 4.7 | FP8 | Consider HGX system (with 8x H200) |
| 3000-4000 | 800 | 1024 | 1410 | 10xH200 | 1.4 to 4.7 | FP8 | Consider HGX system (with NV Link) |
| 4000-5000 | 1000 | 1024 | 1692 | 12xH200 | 1.4 to 4.7 | FP8 | |
| 5000-6000 | 1200 | 1024 | 1974 | 14xH200 | 1.4 to 4.7 | FP8 | |
| 6000-7000 | 1400 | 1024 | 2256 | 16XH200 | 1.4 to 4.7 | FP8 | Consider 2x HGX systems over NV Link |
Note: TTFT benchmarks are approximated at a prompt length of 1024 tokens.
MODEL: LLAMA-3.3-70B (min-prompt-length 1024)
| Users | Concurrent Users (20%) | Input Tokens | VRAM (GB) | GPU | Expected TTFT (Time to First Token in Seconds) | Quantization |
|---|---|---|---|---|---|---|
| 1-99 | 20 |
500 5000 20000 |
2x141G 2x141G 2x141G |
2xH200 2xH200 2xH200 |
0.7 7.5 5.1 |
FP8 FP8 FP8 |
| 100-200 | 100 |
500 5000 20000 |
2x141G 2x141G 2x141G |
2xH200 2xH200 2xH200 |
0.7 7.5 5.1 |
FP8 |
| 500-1000 | 200 |
500 5000 20000 |
2x141G 4x141G 6x141G |
2xH200 4xH200 4xH200 |
1.5 3.5 5 |
FP8 |
| 1001-2000 | 400 |
500 5000 20000 |
2x141G 4x141G 10x141G |
2xH200 4xH200 xH200 |
1.4 7.5 8 |
FP8 |
| 2001-3000 | 600 |
500 5000 20000 |
2x141G 8x141G 20x141G |
2xH200 8xH200 20xH200 |
2.1 5.25 8.4 |
FP8 |
| 3001-4000 | 800 |
500 5000 20000 |
2x141G 8x141G 22x141G |
2xH200 8xH200 20xH200 |
2.25 5.625 8 |
FP8 |
| 4001-5000 | 1000 |
500 5000 20000 |
2x141G 12x141G 26x141G |
2xH200 12xH200 26xH200 |
3.91 6.51 8.314 |
FP8 |
| 5001-6000 | 1200 |
500 5000 20000 |
4x141G |
4xH200 12xH200 26xH200 |
4.72 5.9 8.58 |
FP8 |
Note: TTFT benchmarks are approximated at a prompt length of 1024 tokens.