Start · Introduction

Introduction to the Large Financial Data Model

The Large Financial Data Model (LFDM) is a family of foundation models trained on regulated financial data. This documentation describes the concept, the Brikz platform architecture, the model lifecycle, and the specification of the agents that operate on the LFDM.

What is a Large Financial Data Model

The LFDM is a class of foundation model specialized for financial data — transactions, receivables, regulatory documents, cash-flow time series, ownership graphs, and recorded human decisions. Unlike language models that treat text as a generic sequence, the LFDM learns the native structure of each financial modality and the relationships between them.

The Brikz platform exposes the LFDM through agents — each agent combines a selection of encoders, tools, parameterized regulation and human handoff protocols to operate a specific regulatory cycle.

Modalities covered

Document FM — structured multi-modal extraction from fund regulation, indentures, contracts and credit notes.
Time-series FM — Mamba-2 backbone for aging, PD and cash-flow forecasting over long windows.
Tabular FM — TabPFN-v2 fine-tuned for contextual scoring over originator, payor and counterparty tables.
Graph FM — Production-grade GraphSAGE over directed transaction and ownership graphs.

Platform overview

The platform is structured into five core components, all running on Google Cloud in a dedicated São Paulo region:

Component	Function	Google Cloud
`brikz/connectors`	Data ingestion from registries, Open Finance, ERPs and internal systems.	Pub/Sub, Dataflow, Cloud Storage
`brikz/datasets`	Curation, validation, versioning and governance of training datasets.	BigQuery, Dataplex, Iceberg
`brikz/training`	Foundation model training and per-institution LoRA adaptation.	Vertex AI, TPU v5p, NVIDIA H100
`brikz/agents`	Vertical agents — Structured Credit, AML, Customer Life — on top of the LFDM.	Vertex AI, Spanner Graph, BigQuery Graph, Document AI
`brikz/serving`	Serverless pay-per-query inference, regional endpoints, and API.	Cloud Run, GKE, Vertex AI Endpoints

The platform operates 100% on Google Cloud in a dedicated São Paulo region. Confidential Compute guarantees cryptographic isolation for regulated tenants.

Google Cloud stack

Gemini (Vertex AI Model Garden) — Gemini Pro and Flash families via Model Garden, reasoning engine for the agents.
Gemini Enterprise — enterprise tier of Gemini with data residency, IAM controls and audit logging, applied to regulated client workloads.
Vertex AI — training, fine-tuning, MLOps and serving for foundation models and per-tenant LoRA adapters.
TPU v5p / Trillium — accelerators for foundation model pretraining. Complementary support for NVIDIA A100/H100/H200.
BigQuery — serverless lakehouse, pay-per-query, with native embeddings via BQML.
BigQuery Graph — graph queries over transactional history and ownership chain inside the warehouse.
Spanner Graph — operational representation of the originator/payor/UBO graph with strong consistency.
Document AI — structured multi-modal extraction over fund regulation, indentures, contracts and minutes.
Cloud Run · GKE — serverless inference and always-warm regional endpoints.
Pub/Sub · Dataflow — real-time ingestion of payments, cards and settlement events.
Cloud Storage — datalake in open Iceberg format, partitioned by LoadDate.
Confidential Compute — cryptographic isolation of the execution environment per tenant.
Dataplex · IAM — data catalog, lineage and granular access control.

Getting started

Typical adoption flow follows four steps:

Connect data sources — transactional, onboarding, documents, registries.
Curate the first datasets — temporal partitioning, validation, lineage.
Train the institution's adapter — LoRA over the base LFDM.
Serve the agent — dedicated endpoint, dashboard and API.

Mode A reaches a first exception queue running in production within four weeks of contract signing. Modes B and C have different timelines described below.

Mode A · Managed platform

Default delivery configuration. Brikz hosts the full infrastructure in a dedicated São Paulo region. The client consumes the product through a web dashboard and API. No model ops, no GPU ops, no pipeline ops.

Responsibilities

Responsibility	Brikz	Client
Foundation model pretraining and updates	✓
GPU/TPU operations and MLOps	✓
Per-tenant LoRA adapter training on client data	✓
Regulation parameterization	Advisory	✓
Exception handling and operational overrides		✓
Accounting and legal compliance of the FIDC		✓

Time to production

Default configuration reaches the first exception queue running on the client's real data within four weeks of contract signing.

When to use

Fund managers, fiduciary administrators and custodians who need immediate operations without investing in an AI team.

Mode B · Embedded API

The client integrates the Brikz agent API inside its own digital product, without using the Brikz dashboard. Regulatory decision becomes a capability of the client's product, with optional white-label.

Responsibilities

Responsibility	Brikz	Client
Foundation model and endpoint operations	✓
API availability and SLA	✓
Integration into the client product	Support	✓
End-user experience		✓
Branding and UX flow		✓

Integration

REST API with OAuth 2.0 and optional mTLS. Webhooks for decision events. Python and TypeScript SDKs. Endpoint documentation in the API chapter of this guide.

When to use

Credit fintechs, BaaS, securitizers and originators with their own digital product that want to offer regulatory decisioning without operating the AI.

Mode C · Client tenant

Brikz deploys the full stack into the client's own Google Cloud project, via Terraform and Helm. Data, model and inference stay inside the client perimeter. Brikz maintains the model lifecycle via a shared, auditable runbook.

Responsibilities

Responsibility	Brikz	Client
Architecture definition and runbook	✓
Google Cloud project and billing		✓
Deployment via Terraform + Helm	✓	Approval
Foundation model updates	✓	Window
Security and network operations	Advisory	✓

Provisioned components

Project, billing, networks and VPC Service Controls on the client's Google Cloud
Vertex AI Pipelines, Vertex AI Endpoints, Cloud Run, GKE Autopilot
BigQuery, BigQuery Graph, Spanner Graph, Cloud Storage
Confidential Compute, Cloud KMS with CMEK
Dataplex, Cloud IAM, Cloud Audit Logs, Cloud Monitoring

Time to production

Eight to twelve weeks, including infrastructure provisioning, corporate SSO integration, network configuration and security validation by the client team.

When to use

Mid-size and digital banks, large fund managers and institutions with internal requirements for a dedicated tenant of their own, customer-managed encryption keys and mandatory internal audit.

Intellectual property

In every delivery mode, the IP split is identical:

LFDM foundation models (Document FM, Time-series FM, Tabular FM, Graph FM) remain the intellectual property of Brikz Tecnologia Ltda.
The LoRA adapter trained on the institution's data is the client's own IP. It remains cryptographically isolated and is exportable on request at any time.
The raw data of the client is the client's IP. In no delivery mode is one client's data ever used to fine-tune another client's model.

Structured Credit Agent

The brikz/agent-fidc agent operates the cycle prescribed by CVM 175 for credit fund operations. Every decision ships with an auditable dossier — rationale, applied regulation snapshot, and model version.

Stages operated

Stage	Regulation	Capability
Legal onboarding	CVM 175 · diligence	KYC documentation, ownership chain, UBO
Eligibility & assignment	CVM 175 · assignment	Fund regulation → SQL rules per receivable
Collateral, registry & custody	CVM 175 · collateral	Reconciliation with national registries
Continuous monitoring	CVM 175 · monitoring	Delinquency, subordination, substitution

Endpoint

POST /v1/agents/fidc/decide
{
  "fund_id": "FIDC-2025-0042",
  "receivable_batch": "s3://...",
  "event": "assignment"
}

8 operational layers of the Structured Credit Agent

The Structured Credit Agent runs the CVM 175 cycle across eight connected layers. Each layer has a declared responsibility, a dedicated foundation model, an associated Google Cloud product and an intermediate output consumable by the layers above. The audit trail is built layer by layer — every decision in layer L8 carries traceability down to the raw evidence in layer L1.

L#	Layer	Responsibility	Foundation model	Google Cloud
L1	Ingestion	Receivables (CNAB), documents, onboarding, registries, events	—	Pub/Sub · Dataflow · Cloud Storage
L2	Document AI	Structured multi-modal extraction of fund regulation, indentures, contracts and credit notes	Document FM	Document AI · Gemini
L3	Entity resolution	KYC, KYB, UBO, ownership, links and related parties	Graph FM	Spanner Graph
L4	Signatory powers	Representatives, power limits, signing rules	Document FM	Document AI
L5	Eligibility and assignment	Regulation parameterized into per-receivable SQL rules · CVM 175 · assignment	Tabular FM	BigQuery
L6	Collateral and custody	Existence, uniqueness, ownership and reconciliation · CVM 175 · collateral	Graph FM	Spanner Graph · BigQuery Graph
L7	Continuous monitoring	Delinquency, subordination, substitution, waterfall · CVM 175 · monitoring	Time-series FM	Vertex AI Endpoints
L8	Reporting and dossier	Dashboards, auditable alerts, regulator dossiers, legacy API	—	Cloud Run · Apigee · Looker

Traceability guarantee

Every decision at layer L8 includes:

Agent execution ID and UTC timestamp
Foundation model version in use per layer
Client LoRA adapter version
Applied regulation snapshot, including exact clauses
Pointers to the raw evidence consumed (PDF document, registry entry, CNAB record)

AML Agent

The brikz/agent-aml agent operates the AML cycle prescribed by CVM 50. Operational focus is prioritization under an inspection budget — concentrating illicit activity at the top of the risk ranking.

Two-phase pipeline

Phase 1 — Transaction ranking: directed graph construction, GraphSAGE encoding, MLP decoder producing per-edge probability.
Phase 2 — Investigable cases: top-k% extraction, subgraph induction, decomposition into connected components (n_min = 3), contextual subgraph for the exception queue.

Operational metrics

Metric	Observed value	Scenario
Recall@1%	0.877	AML100k
IPI@1%	7.94	AML100k
Inference throughput	740K edges/s	A100 80GB
Processed volume	124M edges	AML1M

Metrics measured on AMLSim. Reproducible in an isolated Brikz environment on technical request.

Customer Life Agent

The brikz/agent-life agent builds a continuous representation of each retail and business client from transactions, contracts, ownership links and cash-flow series. It produces a dense per-client vector consumable by credit, scoring, fraud and recommendation use cases. Vector dimensionality is a configuration parameter per institution.

customer_embedding = LFDM.encode(
  transactions = stream("payments", "cards"),
  documents    = ["payroll.pdf", "contracts/"],
  graph        = relationship_graph(entity_id),
  timeseries   = cashflow_history(36_months),
  context      = open_finance_consent()
)

The embedding is updated in streaming and exposed via a dedicated endpoint. Common use cases: dynamic primacy, Next Best Action, early financial stress.

Connectors

The platform ships native connectors for common data sources in the Brazilian market:

Receivables registries — CERC, B3, TAG, CIP.
Real-time payments, wire transfers, bills, cards — via ERP, core banking or aggregator.
Open Finance — via authorized initiator API.
Documents — indentures, fund regulation, contracts, credit notes, receipts.
Onboarding — Federal Revenue, Central Bank, OFAC, sanctions and PEP.

Datasets

Datasets are versioned in Iceberg or Delta format, partitioned by load date, with auditable lineage. Each dataset carries a declared schema, observed quality and the regulation snapshot associated with it when applicable.

Ingestion & quality

Ingestion supports batch and streaming. Quality checks include accounting reconciliation, duplicate detection, mandatory-field completeness and referential consistency. Quality events feed the governance dashboard and can automatically block use for training or inference.

Training

Foundation model training combines self-supervised pretraining on large volumes of unlabeled data and supervised post-training on specific tasks. Execution runs on Vertex AI, with TPU v5p and Trillium pools for pretraining and NVIDIA A100/H100/H200 for fine-tuning and specific tasks.

Observed configurations

Architecture	Time (s)	VRAM (GiB)	Use
GraphSAGE	216.7	2.4	AML production
GATv2	241.7	17.5	Comparison
GINE	220.5	7.9	Comparison
MLP baseline	55.9	0.7	Baseline

Per-client adaptation

Each institution receives a dedicated adapter trained via LoRA, QLoRA or DoRA on top of the base LFDM. The adapter captures the particularities of the data and institutional policy without exposing training data to other tenants. Versioning, registration and environment promotion via Vertex AI Model Registry, with lineage integrated to Dataplex.

Benchmark

The platform maintains an internal benchmark for comparing architectures and checkpoints. Reported metrics include AUC-ROC, Average Precision, Recall@k, Lift@k and IPI@k for ranking tasks; AUC-ROC and KS for scoring tasks; coverage and purity for case extraction.

Inference

Inference is served on serverless pay-per-query endpoints, always warm, over Cloud Run, GKE and Vertex AI Endpoints. The runtime uses vLLM or SGLang for generative models, TensorRT-LLM for high-throughput cases, and custom containers for GraphSAGE and tabular models.

Observed performance

Transaction ranking (AML): 740K edges/second on NVIDIA A100 80GB, measured on AMLSim 124M transactions.
Other latency and throughput metrics per agent are reported in the internal benchmark and made available on technical request for each delivery mode.

API

REST API with OAuth 2.0 authentication and optional mTLS. All calls return a decision ID and a pointer to the corresponding auditable dossier.

curl -X POST https://api.brikz.io/v1/agents/fidc/decide \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"fund_id":"FIDC-2025-0042","event":"assignment"}'

Audit trail

Every platform decision is recorded with generated rationale, model version, adapter version, applied regulation snapshot, execution identifier and requester identity. Trail is exportable in a format compatible with internal and external audit.

Security & isolation

Data resides in the dedicated São Paulo region on Google Cloud. Confidential Compute isolates each tenant's execution environment cryptographically. Access controlled by Cloud IAM with per-operation auditing in Cloud Audit Logs. Encryption in transit and at rest by default, with optional CMEK.

Applicable regulation

CVM 175 — credit fund operations, eligibility, collateral and monitoring duties.
CVM 50 — anti-money laundering and counter-terrorism financing.
BCB 119 — risk management and cybersecurity for financial institutions.
LGPD — Brazilian general data protection law.
COAF — atypical operations communication within 24h.

Glossary

Term	Definition
LFDM	Large Financial Data Model. Brikz's foundation model family.
GraphSAGE	GNN architecture based on sample + aggregate over neighborhood.
Recall@k	Fraction of positives retrieved when inspecting the top-k% of the ranking.
IPI@k	Inspections per illicit in the top-k. Measures operational effort.
LoRA	Low-Rank Adaptation. Efficient model adaptation technique.
FIDC	Brazilian receivables-based investment fund.
UBO	Ultimate Beneficial Owner of an ownership structure.

← Back to site Request technical access →