Start · Introduction

Introduction to the Large Financial Data Model

The Large Financial Data Model (LFDM) is a family of foundation models trained on regulated financial data. This documentation describes the concept, the Brikz platform architecture, the model lifecycle, and the specification of the agents that operate on the LFDM.


What is a Large Financial Data Model

The LFDM is a class of foundation model specialized for financial data — transactions, receivables, regulatory documents, cash-flow time series, ownership graphs, and recorded human decisions. Unlike language models that treat text as a generic sequence, the LFDM learns the native structure of each financial modality and the relationships between them.

The Brikz platform exposes the LFDM through agents — each agent combines a selection of encoders, tools, parameterized regulation and human handoff protocols to operate a specific regulatory cycle.

Modalities covered

Platform overview

The platform is structured into five core components, all running on Google Cloud in a dedicated São Paulo region:

ComponentFunctionGoogle Cloud
brikz/connectorsData ingestion from registries, Open Finance, ERPs and internal systems.Pub/Sub, Dataflow, Cloud Storage
brikz/datasetsCuration, validation, versioning and governance of training datasets.BigQuery, Dataplex, Iceberg
brikz/trainingFoundation model training and per-institution LoRA adaptation.Vertex AI, TPU v5p, NVIDIA H100
brikz/agentsVertical agents — Structured Credit, AML, Customer Life — on top of the LFDM.Vertex AI, Spanner Graph, BigQuery Graph, Document AI
brikz/servingServerless pay-per-query inference, regional endpoints, and API.Cloud Run, GKE, Vertex AI Endpoints
The platform operates 100% on Google Cloud in a dedicated São Paulo region. Confidential Compute guarantees cryptographic isolation for regulated tenants.

Google Cloud stack

Getting started

Typical adoption flow follows four steps:

  1. Connect data sources — transactional, onboarding, documents, registries.
  2. Curate the first datasets — temporal partitioning, validation, lineage.
  3. Train the institution's adapter — LoRA over the base LFDM.
  4. Serve the agent — dedicated endpoint, dashboard and API.

Mode A reaches a first exception queue running in production within four weeks of contract signing. Modes B and C have different timelines described below.


Mode A · Managed platform

Default delivery configuration. Brikz hosts the full infrastructure in a dedicated São Paulo region. The client consumes the product through a web dashboard and API. No model ops, no GPU ops, no pipeline ops.

Responsibilities

ResponsibilityBrikzClient
Foundation model pretraining and updates
GPU/TPU operations and MLOps
Per-tenant LoRA adapter training on client data
Regulation parameterizationAdvisory
Exception handling and operational overrides
Accounting and legal compliance of the FIDC

Time to production

Default configuration reaches the first exception queue running on the client's real data within four weeks of contract signing.

When to use

Fund managers, fiduciary administrators and custodians who need immediate operations without investing in an AI team.

Mode B · Embedded API

The client integrates the Brikz agent API inside its own digital product, without using the Brikz dashboard. Regulatory decision becomes a capability of the client's product, with optional white-label.

Responsibilities

ResponsibilityBrikzClient
Foundation model and endpoint operations
API availability and SLA
Integration into the client productSupport
End-user experience
Branding and UX flow

Integration

REST API with OAuth 2.0 and optional mTLS. Webhooks for decision events. Python and TypeScript SDKs. Endpoint documentation in the API chapter of this guide.

When to use

Credit fintechs, BaaS, securitizers and originators with their own digital product that want to offer regulatory decisioning without operating the AI.

Mode C · Client tenant

Brikz deploys the full stack into the client's own Google Cloud project, via Terraform and Helm. Data, model and inference stay inside the client perimeter. Brikz maintains the model lifecycle via a shared, auditable runbook.

Responsibilities

ResponsibilityBrikzClient
Architecture definition and runbook
Google Cloud project and billing
Deployment via Terraform + HelmApproval
Foundation model updatesWindow
Security and network operationsAdvisory

Provisioned components

Time to production

Eight to twelve weeks, including infrastructure provisioning, corporate SSO integration, network configuration and security validation by the client team.

When to use

Mid-size and digital banks, large fund managers and institutions with internal requirements for a dedicated tenant of their own, customer-managed encryption keys and mandatory internal audit.

Intellectual property

In every delivery mode, the IP split is identical:


Structured Credit Agent

The brikz/agent-fidc agent operates the cycle prescribed by CVM 175 for credit fund operations. Every decision ships with an auditable dossier — rationale, applied regulation snapshot, and model version.

Stages operated

StageRegulationCapability
Legal onboardingCVM 175 · diligenceKYC documentation, ownership chain, UBO
Eligibility & assignmentCVM 175 · assignmentFund regulation → SQL rules per receivable
Collateral, registry & custodyCVM 175 · collateralReconciliation with national registries
Continuous monitoringCVM 175 · monitoringDelinquency, subordination, substitution

Endpoint

POST /v1/agents/fidc/decide
{
  "fund_id": "FIDC-2025-0042",
  "receivable_batch": "s3://...",
  "event": "assignment"
}

8 operational layers of the Structured Credit Agent

The Structured Credit Agent runs the CVM 175 cycle across eight connected layers. Each layer has a declared responsibility, a dedicated foundation model, an associated Google Cloud product and an intermediate output consumable by the layers above. The audit trail is built layer by layer — every decision in layer L8 carries traceability down to the raw evidence in layer L1.

L#LayerResponsibilityFoundation modelGoogle Cloud
L1IngestionReceivables (CNAB), documents, onboarding, registries, eventsPub/Sub · Dataflow · Cloud Storage
L2Document AIStructured multi-modal extraction of fund regulation, indentures, contracts and credit notesDocument FMDocument AI · Gemini
L3Entity resolutionKYC, KYB, UBO, ownership, links and related partiesGraph FMSpanner Graph
L4Signatory powersRepresentatives, power limits, signing rulesDocument FMDocument AI
L5Eligibility and assignmentRegulation parameterized into per-receivable SQL rules · CVM 175 · assignmentTabular FMBigQuery
L6Collateral and custodyExistence, uniqueness, ownership and reconciliation · CVM 175 · collateralGraph FMSpanner Graph · BigQuery Graph
L7Continuous monitoringDelinquency, subordination, substitution, waterfall · CVM 175 · monitoringTime-series FMVertex AI Endpoints
L8Reporting and dossierDashboards, auditable alerts, regulator dossiers, legacy APICloud Run · Apigee · Looker

Traceability guarantee

Every decision at layer L8 includes:

AML Agent

The brikz/agent-aml agent operates the AML cycle prescribed by CVM 50. Operational focus is prioritization under an inspection budget — concentrating illicit activity at the top of the risk ranking.

Two-phase pipeline

Operational metrics

MetricObserved valueScenario
Recall@1%0.877AML100k
IPI@1%7.94AML100k
Inference throughput740K edges/sA100 80GB
Processed volume124M edgesAML1M
Metrics measured on AMLSim. Reproducible in an isolated Brikz environment on technical request.

Customer Life Agent

The brikz/agent-life agent builds a continuous representation of each retail and business client from transactions, contracts, ownership links and cash-flow series. It produces a dense per-client vector consumable by credit, scoring, fraud and recommendation use cases. Vector dimensionality is a configuration parameter per institution.

customer_embedding = LFDM.encode(
  transactions = stream("payments", "cards"),
  documents    = ["payroll.pdf", "contracts/"],
  graph        = relationship_graph(entity_id),
  timeseries   = cashflow_history(36_months),
  context      = open_finance_consent()
)

The embedding is updated in streaming and exposed via a dedicated endpoint. Common use cases: dynamic primacy, Next Best Action, early financial stress.


Connectors

The platform ships native connectors for common data sources in the Brazilian market:

Datasets

Datasets are versioned in Iceberg or Delta format, partitioned by load date, with auditable lineage. Each dataset carries a declared schema, observed quality and the regulation snapshot associated with it when applicable.

Ingestion & quality

Ingestion supports batch and streaming. Quality checks include accounting reconciliation, duplicate detection, mandatory-field completeness and referential consistency. Quality events feed the governance dashboard and can automatically block use for training or inference.

Training

Foundation model training combines self-supervised pretraining on large volumes of unlabeled data and supervised post-training on specific tasks. Execution runs on Vertex AI, with TPU v5p and Trillium pools for pretraining and NVIDIA A100/H100/H200 for fine-tuning and specific tasks.

Observed configurations

ArchitectureTime (s)VRAM (GiB)Use
GraphSAGE216.72.4AML production
GATv2241.717.5Comparison
GINE220.57.9Comparison
MLP baseline55.90.7Baseline

Per-client adaptation

Each institution receives a dedicated adapter trained via LoRA, QLoRA or DoRA on top of the base LFDM. The adapter captures the particularities of the data and institutional policy without exposing training data to other tenants. Versioning, registration and environment promotion via Vertex AI Model Registry, with lineage integrated to Dataplex.

Benchmark

The platform maintains an internal benchmark for comparing architectures and checkpoints. Reported metrics include AUC-ROC, Average Precision, Recall@k, Lift@k and IPI@k for ranking tasks; AUC-ROC and KS for scoring tasks; coverage and purity for case extraction.

Inference

Inference is served on serverless pay-per-query endpoints, always warm, over Cloud Run, GKE and Vertex AI Endpoints. The runtime uses vLLM or SGLang for generative models, TensorRT-LLM for high-throughput cases, and custom containers for GraphSAGE and tabular models.

Observed performance

API

REST API with OAuth 2.0 authentication and optional mTLS. All calls return a decision ID and a pointer to the corresponding auditable dossier.

curl -X POST https://api.brikz.io/v1/agents/fidc/decide \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"fund_id":"FIDC-2025-0042","event":"assignment"}'

Audit trail

Every platform decision is recorded with generated rationale, model version, adapter version, applied regulation snapshot, execution identifier and requester identity. Trail is exportable in a format compatible with internal and external audit.

Security & isolation

Data resides in the dedicated São Paulo region on Google Cloud. Confidential Compute isolates each tenant's execution environment cryptographically. Access controlled by Cloud IAM with per-operation auditing in Cloud Audit Logs. Encryption in transit and at rest by default, with optional CMEK.

Applicable regulation

Glossary

TermDefinition
LFDMLarge Financial Data Model. Brikz's foundation model family.
GraphSAGEGNN architecture based on sample + aggregate over neighborhood.
Recall@kFraction of positives retrieved when inspecting the top-k% of the ranking.
IPI@kInspections per illicit in the top-k. Measures operational effort.
LoRALow-Rank Adaptation. Efficient model adaptation technique.
FIDCBrazilian receivables-based investment fund.
UBOUltimate Beneficial Owner of an ownership structure.