Hardware Requirementsfor FinBlade AI

From creators to corporations, FinBlade AI delivers purpose-built plans that scale with your ambition.

Front-End Server

The frontend server application can be deployed either within a Virtual Machine (VM) or as a Docker container. The recommended hardware specifications are provided here:

Component	Requirement	Notes
Processor (CPU)	Either of the following types of 64-bit processors: 1. Intel processor that supports Intel 64 architecture 2. AMD processor that supports the AMD64 platform. Notes: Recommended supported processor sockets are up to 2 on physical machines.	See Section “Hardware requirements by deployment size” for details
Memory	16 Slot DIMS DDR5-4800
Local Storage	NVMe Gen4 Performance Read Intensive SSF
Operating System	Ubuntu LTS Linux 22.04
On-prem Kubernetes control planes	Rancher /Kubeadm
Screen resolution (for thin client)	1024 x 768 pixels (XGA) or higher
File system	ext4
Management	1x Ethernet 1 GPBS RJ45
Redundancy	Provide 2 physical servers for redundancy

Middleware Server

The middle server application runs the API Gateway, Queuing Server, Vector Embedding processor and other microservices. This bundle is deployed as part of a Kubernetes cluster. The recommended hardware specifications are provided here:

Component	Requirement	Notes
Processor (CPU)	Either of the following types of 64-bit processors:  Intel processor that supports Intel 64 architecture. Intel Xenon Processors are recommended  AMD processor that supports the AMD64 platform. AMD EPYC processors are recommended Notes: Recommended supported processor sockets are up to 2 on physical machines and Dual Socket Processors	See Section “Hardware requirements by deployment size” for details
Memory	16 Slot DIMS DDR5-4800
Local Storage	NVMe Gen4 Performance Read Intensive SSF
Network	Dual Port for redundancy
Operating System	Ubuntu LTS Linux 22.04
On-prem Kubernetes control planes	Rancher /Kubeadm
Screen resolution (for thin client)	1024 x 768 pixels (XGA) or higher
File system	ext4
Management Port	1x Ethernet 1 GPBS RJ45
Redundancy	Provide multiple physical servers for redundancy

Client Browser

FinBlade AI is a web-based platform that users can access directly through a standard web browser. The following are the recommended hardware specifications for the thin client:

Component	Requirement
Processor (CPU)	Either of the following types of 64-bit processors:  Minimum: Intel Core i5 (10th Gen or later) / AMD Ryzen 5 (4000 series or later)  Recommended: Intel Core i7 / AMD Ryzen 7
Memory	Minimum 8GB RAM
Local Storage	No files are stored on the client side.
Network	10 Mbps connection
Operating System	Windows/Linux/MacOS
Browser	Latest versions of Chrome, Edge, Firefox, or Safari
Screen resolution	1024 x 768 pixels (XGA) or higher

Hardware Requirements By Deployment Size

Users	Concurrent Users	Frontend Server Requirements	Backend Server Requirements	K8s Server Requirements
1-99	20	16 core CPU 64 GB Memory	16 core CPU 64 GB Memory	24 CPU Cores 48GB Memory
100-500	100	24 core CPU 128 GB Memory	64 core CPU 512 GB Memory	24core CPU 48GB Memory
500-1000	200	32 core CPU 512 GB Memory	128 core CPU 512 GB Memory	24core CPU 48GB Memory
1001-2000	400	64 core CPU 1 TB Memory	200core CPU 64 GB Memory	24core CPU 48GB Memory
2001-3000	600	96 core CPU 1.25 TB Memory	300core CPU 64 GB Memory	24core CPU 48GB Memory
3001-4000	800	128 core CPU 1.5TB Memory	400core CPU 64GB Memory	24core CPU 48GB Memory
4001-5000	1000	146 core CPU 1.75TB Memory	500core CPU 64GB Memory	24core CPU 48GB Memory
5001-6000	1200	172core CPU 2TB Memory	600core CPU 64 GB Memory	24core CPU 48GB Memory

Minimum three (3) physical servers required. Core count is provided above.
20% concurrency considered
K8s control plane is deployed with a minimum of 3 Master Nodes (each with an 8-core CPU requirement and 16GB RAM)

Note: The recommended GPU for the backend Server is NVIDIA RTX L40s. A 2GB GPU is dedicated to each concurrent user. These are for vector embedding processing.

Storage Requirements

FinBlade AI requires both File Storage (NAS) and Block storage access to the Middleware K8s cluster. The performance of the
system is based on IOPs.

Vector Database Server

Performance Metrics	SSD Usage	RAM Usage
High IOPS	0%	100%
Mid IOPS	50%	50%
Low IOPS	80%	10%

Storage Requirement By Deployment Size

RAM Requirements: Based on the performance metric selected above, the following calculation determines the required RAM. Typical vector dimensions used by FinBlade AI are 1024 dimensions.

memory_size(RAM) = number_of_vectors * vector_dimension * 4 bytes * 1.5

NAS Required: Based on client requirements

Hardware Requirements for AI Cluster

Purpose-built infrastructure engineered to handle complex workloads and unlock next-level performance.

LLM INFERENCE SERVER (< 8 GPU SYSTEM)

Component	Requirement	Notes
GPUs	Nvidia H200 141 GB SXM5 GPU (or equal)	See Section “Hardware requirements by deployment size” for details
Processor(CPU)	Either of the following types of 64-bit processors: 1. Intel processor that supports Intel 64 architecture 2. AMD processor that supports the AMD64 platform. Notes: Recommended supported processor sockets are up to 2 on physical machines.
Memory	16 Slot DIMS DDR5-6400 EC8
Local Storage	NVMe Gen4 Performance Read Intensive SSF
Network	Ethernet
Operating System	Ubuntu LTS Linux 22.04	None
On-prem Kubernetes control planes	Rancher /Kubeadm
File system	ext4	None

LLM INFERENCE SERVER (> 8 GPU SYSTEM)

Component	Requirement	Notes
GPUs	Nvidia HGX 8xH200 system	See Section “Hardware requirements by deployment size” for details
Processor(CPU)	Either of the following types of 64-bit processors: 1. Intel processor that supports Intel 64 architecture 2. AMD processor that supports the AMD64 platform. Notes: Recommended supported processor sockets are up to 2 on physical machines.
Memory	16 Slot DIMS DDR5-6400 EC8
Local Storage	NVMe Gen4 Performance Read Intensive SSF
Network	Infiniband to link HGX chassis
Operating System	Ubuntu LTS Linux 22.04	None
On-prem Kubernetes control planes	Rancher /Kubeadm
File system	ext4	None

BY DEPLOYMENT SIZE

The hardware required for the inferencing of the AI model can vary depending on the model

MODEL: GPT-OSS-120B

Users	Concurrent Users (20%)	Input Tokens	VRAM (GB)	GPU	Expected TTFT (Time to First Token in Seconds)	Quantization	Notes
1-100	20	1024	96	1xH20	2.3	FP8
100-500	100	1024	141	1xH200	4.8	FP8
500-1000	200	1024	282	2xH200	1.4 to 4.7	FP8
1000-2000	400	1024	564	4xH200	1.4 to 4.7	FP8
2000-3000	600	1024	1128	8xH200	1.4 to 4.7	FP8	Consider HGX system (with 8x H200)
3000-4000	800	1024	1410	10xH200	1.4 to 4.7	FP8	Consider HGX system (with NV Link)
4000-5000	1000	1024	1692	12xH200	1.4 to 4.7	FP8
5000-6000	1200	1024	1974	14xH200	1.4 to 4.7	FP8
6000-7000	1400	1024	2256	16XH200	1.4 to 4.7	FP8	Consider 2x HGX systems over NV Link

Note: TTFT benchmarks are approximated at a prompt length of 1024 tokens.

MODEL: LLAMA-3.3-70B (min-prompt-length 1024)

Users	Concurrent Users (20%)	Input Tokens	VRAM (GB)	GPU	Expected TTFT (Time to First Token in Seconds)	Quantization
1-99	20	500 5000 20000	2x141G 2x141G 2x141G	2xH200 2xH200 2xH200	0.7 7.5 5.1	FP8 FP8 FP8
100-200	100	500 5000 20000	2x141G 2x141G 2x141G	2xH200 2xH200 2xH200	0.7 7.5 5.1	FP8
500-1000	200	500 5000 20000	2x141G 4x141G 6x141G	2xH200 4xH200 4xH200	1.5 3.5 5	FP8
1001-2000	400	500 5000 20000	2x141G 4x141G 10x141G	2xH200 4xH200 xH200	1.4 7.5 8	FP8
2001-3000	600	500 5000 20000	2x141G 8x141G 20x141G	2xH200 8xH200 20xH200	2.1 5.25 8.4	FP8
3001-4000	800	500 5000 20000	2x141G 8x141G 22x141G	2xH200 8xH200 20xH200	2.25 5.625 8	FP8
4001-5000	1000	500 5000 20000	2x141G 12x141G 26x141G	2xH200 12xH200 26xH200	3.91 6.51 8.314	FP8
5001-6000	1200	500 5000 20000	4x141G	4xH200 12xH200 26xH200	4.72 5.9 8.58	FP8