Hardware Requirementsfor FinBlade AI

From creators to corporations, FinBlade AI delivers purpose-built plans that scale with your ambition.

Front-End Server

The frontend server application can be deployed either within a Virtual Machine (VM) or as a Docker container. The recommended hardware specifications are provided here:

Component Requirement Notes
Processor (CPU) Either of the following types of 64-bit processors:
1. Intel processor that supports Intel 64 architecture
2. AMD processor that supports the AMD64 platform.
Notes: Recommended supported processor sockets are up to 2 on physical machines.
See Section “Hardware requirements by deployment size” for details
Memory 16 Slot DIMS DDR5-4800
Local Storage NVMe Gen4 Performance Read Intensive SSF
Operating System Ubuntu LTS Linux 22.04
On-prem Kubernetes control planes Rancher /Kubeadm
Screen resolution (for thin client) 1024 x 768 pixels (XGA) or higher
File system ext4
Management 1x Ethernet 1 GPBS RJ45
Redundancy Provide 2 physical servers for redundancy

Middleware Server

The middle server application runs the API Gateway, Queuing Server, Vector Embedding processor and other microservices. This bundle is deployed as part of a Kubernetes cluster. The recommended hardware specifications are provided here:

Component Requirement Notes
Processor (CPU) Either of the following types of 64-bit processors:
 Intel processor that supports Intel 64 architecture. Intel Xenon Processors are recommended
 AMD processor that supports the AMD64 platform. AMD EPYC processors are recommended
Notes: Recommended supported processor sockets are up to 2 on physical machines and Dual Socket Processors
See Section “Hardware requirements by deployment size” for details
Memory 16 Slot DIMS DDR5-4800
Local Storage NVMe Gen4 Performance Read Intensive SSF
Network Dual Port for redundancy
Operating System Ubuntu LTS Linux 22.04
On-prem Kubernetes control planes Rancher /Kubeadm
Screen resolution (for thin client) 1024 x 768 pixels (XGA) or higher
File system ext4
Management Port 1x Ethernet 1 GPBS RJ45
Redundancy Provide multiple physical servers for redundancy

Client Browser

FinBlade AI is a web-based platform that users can access directly through a standard web browser. The following are the recommended hardware specifications for the thin client:

Component Requirement
Processor (CPU) Either of the following types of 64-bit processors:
 Minimum: Intel Core i5 (10th Gen or later) / AMD Ryzen 5 (4000 series or later)
 Recommended: Intel Core i7 / AMD Ryzen 7
Memory Minimum 8GB RAM
Local Storage No files are stored on the client side.
Network 10 Mbps connection
Operating System Windows/Linux/MacOS
Browser Latest versions of Chrome, Edge, Firefox, or Safari
Screen resolution 1024 x 768 pixels (XGA) or higher

Hardware Requirements By Deployment Size

Users Concurrent Users Frontend Server Requirements Backend Server Requirements K8s Server Requirements
1-99 20 16 core CPU
64 GB Memory
16 core CPU
64 GB Memory
24 CPU Cores
48GB Memory
100-500 100 24 core CPU
128 GB Memory
64 core CPU
512 GB Memory
24core CPU
48GB Memory
500-1000 200 32 core CPU
512 GB Memory
128 core CPU
512 GB Memory
24core CPU
48GB Memory
1001-2000 400 64 core CPU
1 TB Memory
200core CPU
64 GB Memory
24core CPU
48GB Memory
2001-3000 600 96 core CPU
1.25 TB Memory
300core CPU
64 GB Memory
24core CPU
48GB Memory
3001-4000 800 128 core CPU
1.5TB Memory
400core CPU
64GB Memory
24core CPU
48GB Memory
4001-5000 1000 146 core CPU
1.75TB Memory
500core CPU
64GB Memory
24core CPU
48GB Memory
5001-6000 1200 172core CPU
2TB Memory
600core CPU
64 GB Memory
24core CPU
48GB Memory
  • Minimum three (3) physical servers required. Core count is provided above.
  • 20% concurrency considered
  • K8s control plane is deployed with a minimum of 3 Master Nodes (each with an 8-core CPU requirement and 16GB RAM)

Note: The recommended GPU for the backend Server is NVIDIA RTX L40s. A 2GB GPU is dedicated to each concurrent user. These are for vector embedding processing.

Storage Requirements

FinBlade AI requires both File Storage (NAS) and Block storage access to the Middleware K8s cluster. The performance of the
system is based on IOPs.

Vector Database Server

Performance Metrics SSD Usage RAM Usage
High IOPS 0% 100%
Mid IOPS 50% 50%
Low IOPS 80% 10%

Storage Requirement By Deployment Size

RAM Requirements: Based on the performance metric selected above, the following calculation determines the required RAM. Typical vector dimensions used by FinBlade AI are 1024 dimensions.

  • memory_size(RAM) = number_of_vectors * vector_dimension * 4 bytes * 1.5

NAS Required: Based on client requirements

Hardware Requirements for AI Cluster

Purpose-built infrastructure engineered to handle complex workloads and unlock next-level performance.

LLM INFERENCE SERVER (< 8 GPU SYSTEM)

Component Requirement Notes
GPUs Nvidia H200 141 GB SXM5 GPU (or equal) See Section “Hardware requirements by deployment size” for details
Processor(CPU) Either of the following types of 64-bit processors:
1. Intel processor that supports Intel 64 architecture
2. AMD processor that supports the AMD64 platform.
Notes: Recommended supported processor sockets are up to 2 on physical machines.
Memory 16 Slot DIMS DDR5-6400 EC8
Local Storage NVMe Gen4 Performance Read Intensive SSF
Network Ethernet
Operating System Ubuntu LTS Linux 22.04 None
On-prem Kubernetes control planes Rancher /Kubeadm
File system ext4 None

LLM INFERENCE SERVER (> 8 GPU SYSTEM)

Component Requirement Notes
GPUs Nvidia HGX 8xH200 system See Section “Hardware requirements by deployment size” for details
Processor(CPU) Either of the following types of 64-bit processors:
1. Intel processor that supports Intel 64 architecture
2. AMD processor that supports the AMD64 platform.
Notes: Recommended supported processor sockets are up to 2 on physical machines.
Memory 16 Slot DIMS DDR5-6400 EC8
Local Storage NVMe Gen4 Performance Read Intensive SSF
Network Infiniband to link HGX chassis
Operating System Ubuntu LTS Linux 22.04 None
On-prem Kubernetes control planes Rancher /Kubeadm
File system ext4 None

BY DEPLOYMENT SIZE

The hardware required for the inferencing of the AI model can vary depending on the model

MODEL: GPT-OSS-120B

Users Concurrent Users (20%) Input Tokens VRAM (GB) GPU Expected TTFT (Time to First Token in Seconds) Quantization Notes
1-100 20 1024 96 1xH20 2.3 FP8
100-500 100 1024 141 1xH200 4.8 FP8
500-1000 200 1024 282 2xH200 1.4 to 4.7 FP8
1000-2000 400 1024 564 4xH200 1.4 to 4.7 FP8
2000-3000 600 1024 1128 8xH200 1.4 to 4.7 FP8 Consider HGX system (with 8x H200)
3000-4000 800 1024 1410 10xH200 1.4 to 4.7 FP8 Consider HGX system (with NV Link)
4000-5000 1000 1024 1692 12xH200 1.4 to 4.7 FP8
5000-6000 1200 1024 1974 14xH200 1.4 to 4.7 FP8
6000-7000 1400 1024 2256 16XH200 1.4 to 4.7 FP8 Consider 2x HGX systems over NV Link

Note: TTFT benchmarks are approximated at a prompt length of 1024 tokens.

MODEL: LLAMA-3.3-70B (min-prompt-length 1024)

Users Concurrent Users (20%) Input Tokens VRAM (GB) GPU Expected TTFT (Time to First Token in Seconds) Quantization
1-99 20 500
5000
20000
2x141G
2x141G
2x141G
2xH200
2xH200
2xH200
0.7
7.5
5.1
FP8
FP8
FP8
100-200 100 500
5000
20000
2x141G
2x141G
2x141G
2xH200
2xH200
2xH200
0.7
7.5
5.1
FP8
500-1000 200 500
5000
20000
2x141G
4x141G
6x141G
2xH200
4xH200
4xH200
1.5
3.5
5
FP8
1001-2000 400 500
5000
20000
2x141G
4x141G
10x141G
2xH200
4xH200
xH200
1.4
7.5
8
FP8
2001-3000 600 500
5000
20000
2x141G
8x141G
20x141G
2xH200
8xH200
20xH200
2.1
5.25
8.4
FP8
3001-4000 800 500
5000
20000
2x141G
8x141G
22x141G
2xH200
8xH200
20xH200
2.25
5.625
8
FP8
4001-5000 1000 500
5000
20000
2x141G
12x141G
26x141G
2xH200
12xH200
26xH200
3.91
6.51
8.314
FP8
5001-6000 1200 500
5000
20000
4x141G 4xH200
12xH200
26xH200
4.72
5.9
8.58
FP8

Note: TTFT benchmarks are approximated at a prompt length of 1024 tokens.