Start · Introduction
Introduction to the Large Financial Data Model
The Large Financial Data Model (LFDM) is a family of foundation models trained on regulated financial data. This documentation describes the concept, the Brikz platform architecture, the model lifecycle, and the specification of the agents that operate on the LFDM.
What is a Large Financial Data Model
The LFDM is a class of foundation model specialized for financial data — transactions, receivables, regulatory documents, cash-flow time series, ownership graphs, and recorded human decisions. Unlike language models that treat text as a generic sequence, the LFDM learns the native structure of each financial modality and the relationships between them.
The Brikz platform exposes the LFDM through agents — each agent combines a selection of encoders, tools, parameterized regulation and human handoff protocols to operate a specific regulatory cycle.
Modalities covered
- Document FM — structured multi-modal extraction from fund regulation, indentures, contracts and credit notes.
- Time-series FM — Mamba-2 backbone for aging, PD and cash-flow forecasting over long windows.
- Tabular FM — TabPFN-v2 fine-tuned for contextual scoring over originator, payor and counterparty tables.
- Graph FM — Production-grade GraphSAGE over directed transaction and ownership graphs.
Platform overview
The platform is structured into five core components, all running on Google Cloud in a dedicated São Paulo region:
| Component | Function | Google Cloud |
|---|---|---|
brikz/connectors | Data ingestion from registries, Open Finance, ERPs and internal systems. | Pub/Sub, Dataflow, Cloud Storage |
brikz/datasets | Curation, validation, versioning and governance of training datasets. | BigQuery, Dataplex, Iceberg |
brikz/training | Foundation model training and per-institution LoRA adaptation. | Vertex AI, TPU v5p, NVIDIA H100 |
brikz/agents | Vertical agents — Structured Credit, AML, Customer Life — on top of the LFDM. | Vertex AI, Spanner Graph, BigQuery Graph, Document AI |
brikz/serving | Serverless pay-per-query inference, regional endpoints, and API. | Cloud Run, GKE, Vertex AI Endpoints |
Google Cloud stack
- Gemini (Vertex AI Model Garden) — Gemini Pro and Flash families via Model Garden, reasoning engine for the agents.
- Gemini Enterprise — enterprise tier of Gemini with data residency, IAM controls and audit logging, applied to regulated client workloads.
- Vertex AI — training, fine-tuning, MLOps and serving for foundation models and per-tenant LoRA adapters.
- TPU v5p / Trillium — accelerators for foundation model pretraining. Complementary support for NVIDIA A100/H100/H200.
- BigQuery — serverless lakehouse, pay-per-query, with native embeddings via BQML.
- BigQuery Graph — graph queries over transactional history and ownership chain inside the warehouse.
- Spanner Graph — operational representation of the originator/payor/UBO graph with strong consistency.
- Document AI — structured multi-modal extraction over fund regulation, indentures, contracts and minutes.
- Cloud Run · GKE — serverless inference and always-warm regional endpoints.
- Pub/Sub · Dataflow — real-time ingestion of payments, cards and settlement events.
- Cloud Storage — datalake in open Iceberg format, partitioned by LoadDate.
- Confidential Compute — cryptographic isolation of the execution environment per tenant.
- Dataplex · IAM — data catalog, lineage and granular access control.
Getting started
Typical adoption flow follows four steps:
- Connect data sources — transactional, onboarding, documents, registries.
- Curate the first datasets — temporal partitioning, validation, lineage.
- Train the institution's adapter — LoRA over the base LFDM.
- Serve the agent — dedicated endpoint, dashboard and API.
Mode A reaches a first exception queue running in production within four weeks of contract signing. Modes B and C have different timelines described below.
Mode A · Managed platform
Default delivery configuration. Brikz hosts the full infrastructure in a dedicated São Paulo region. The client consumes the product through a web dashboard and API. No model ops, no GPU ops, no pipeline ops.
Responsibilities
| Responsibility | Brikz | Client |
|---|---|---|
| Foundation model pretraining and updates | ✓ | |
| GPU/TPU operations and MLOps | ✓ | |
| Per-tenant LoRA adapter training on client data | ✓ | |
| Regulation parameterization | Advisory | ✓ |
| Exception handling and operational overrides | ✓ | |
| Accounting and legal compliance of the FIDC | ✓ |
Time to production
Default configuration reaches the first exception queue running on the client's real data within four weeks of contract signing.
When to use
Fund managers, fiduciary administrators and custodians who need immediate operations without investing in an AI team.
Mode B · Embedded API
The client integrates the Brikz agent API inside its own digital product, without using the Brikz dashboard. Regulatory decision becomes a capability of the client's product, with optional white-label.
Responsibilities
| Responsibility | Brikz | Client |
|---|---|---|
| Foundation model and endpoint operations | ✓ | |
| API availability and SLA | ✓ | |
| Integration into the client product | Support | ✓ |
| End-user experience | ✓ | |
| Branding and UX flow | ✓ |
Integration
REST API with OAuth 2.0 and optional mTLS. Webhooks for decision events. Python and TypeScript SDKs. Endpoint documentation in the API chapter of this guide.
When to use
Credit fintechs, BaaS, securitizers and originators with their own digital product that want to offer regulatory decisioning without operating the AI.
Mode C · Client tenant
Brikz deploys the full stack into the client's own Google Cloud project, via Terraform and Helm. Data, model and inference stay inside the client perimeter. Brikz maintains the model lifecycle via a shared, auditable runbook.
Responsibilities
| Responsibility | Brikz | Client |
|---|---|---|
| Architecture definition and runbook | ✓ | |
| Google Cloud project and billing | ✓ | |
| Deployment via Terraform + Helm | ✓ | Approval |
| Foundation model updates | ✓ | Window |
| Security and network operations | Advisory | ✓ |
Provisioned components
- Project, billing, networks and VPC Service Controls on the client's Google Cloud
- Vertex AI Pipelines, Vertex AI Endpoints, Cloud Run, GKE Autopilot
- BigQuery, BigQuery Graph, Spanner Graph, Cloud Storage
- Confidential Compute, Cloud KMS with CMEK
- Dataplex, Cloud IAM, Cloud Audit Logs, Cloud Monitoring
Time to production
Eight to twelve weeks, including infrastructure provisioning, corporate SSO integration, network configuration and security validation by the client team.
When to use
Mid-size and digital banks, large fund managers and institutions with internal requirements for a dedicated tenant of their own, customer-managed encryption keys and mandatory internal audit.
Intellectual property
In every delivery mode, the IP split is identical:
- LFDM foundation models (Document FM, Time-series FM, Tabular FM, Graph FM) remain the intellectual property of Brikz Tecnologia Ltda.
- The LoRA adapter trained on the institution's data is the client's own IP. It remains cryptographically isolated and is exportable on request at any time.
- The raw data of the client is the client's IP. In no delivery mode is one client's data ever used to fine-tune another client's model.
Structured Credit Agent
The brikz/agent-fidc agent operates the cycle prescribed by CVM 175 for credit fund operations. Every decision ships with an auditable dossier — rationale, applied regulation snapshot, and model version.
Stages operated
| Stage | Regulation | Capability |
|---|---|---|
| Legal onboarding | CVM 175 · diligence | KYC documentation, ownership chain, UBO |
| Eligibility & assignment | CVM 175 · assignment | Fund regulation → SQL rules per receivable |
| Collateral, registry & custody | CVM 175 · collateral | Reconciliation with national registries |
| Continuous monitoring | CVM 175 · monitoring | Delinquency, subordination, substitution |
Endpoint
POST /v1/agents/fidc/decide
{
"fund_id": "FIDC-2025-0042",
"receivable_batch": "s3://...",
"event": "assignment"
}
8 operational layers of the Structured Credit Agent
The Structured Credit Agent runs the CVM 175 cycle across eight connected layers. Each layer has a declared responsibility, a dedicated foundation model, an associated Google Cloud product and an intermediate output consumable by the layers above. The audit trail is built layer by layer — every decision in layer L8 carries traceability down to the raw evidence in layer L1.
| L# | Layer | Responsibility | Foundation model | Google Cloud |
|---|---|---|---|---|
| L1 | Ingestion | Receivables (CNAB), documents, onboarding, registries, events | — | Pub/Sub · Dataflow · Cloud Storage |
| L2 | Document AI | Structured multi-modal extraction of fund regulation, indentures, contracts and credit notes | Document FM | Document AI · Gemini |
| L3 | Entity resolution | KYC, KYB, UBO, ownership, links and related parties | Graph FM | Spanner Graph |
| L4 | Signatory powers | Representatives, power limits, signing rules | Document FM | Document AI |
| L5 | Eligibility and assignment | Regulation parameterized into per-receivable SQL rules · CVM 175 · assignment | Tabular FM | BigQuery |
| L6 | Collateral and custody | Existence, uniqueness, ownership and reconciliation · CVM 175 · collateral | Graph FM | Spanner Graph · BigQuery Graph |
| L7 | Continuous monitoring | Delinquency, subordination, substitution, waterfall · CVM 175 · monitoring | Time-series FM | Vertex AI Endpoints |
| L8 | Reporting and dossier | Dashboards, auditable alerts, regulator dossiers, legacy API | — | Cloud Run · Apigee · Looker |
Traceability guarantee
Every decision at layer L8 includes:
- Agent execution ID and UTC timestamp
- Foundation model version in use per layer
- Client LoRA adapter version
- Applied regulation snapshot, including exact clauses
- Pointers to the raw evidence consumed (PDF document, registry entry, CNAB record)
AML Agent
The brikz/agent-aml agent operates the AML cycle prescribed by CVM 50. Operational focus is prioritization under an inspection budget — concentrating illicit activity at the top of the risk ranking.
Two-phase pipeline
- Phase 1 — Transaction ranking: directed graph construction, GraphSAGE encoding, MLP decoder producing per-edge probability.
- Phase 2 — Investigable cases: top-k% extraction, subgraph induction, decomposition into connected components (n_min = 3), contextual subgraph for the exception queue.
Operational metrics
| Metric | Observed value | Scenario |
|---|---|---|
| Recall@1% | 0.877 | AML100k |
| IPI@1% | 7.94 | AML100k |
| Inference throughput | 740K edges/s | A100 80GB |
| Processed volume | 124M edges | AML1M |
Customer Life Agent
The brikz/agent-life agent builds a continuous representation of each retail and business client from transactions, contracts, ownership links and cash-flow series. It produces a dense per-client vector consumable by credit, scoring, fraud and recommendation use cases. Vector dimensionality is a configuration parameter per institution.
customer_embedding = LFDM.encode(
transactions = stream("payments", "cards"),
documents = ["payroll.pdf", "contracts/"],
graph = relationship_graph(entity_id),
timeseries = cashflow_history(36_months),
context = open_finance_consent()
)
The embedding is updated in streaming and exposed via a dedicated endpoint. Common use cases: dynamic primacy, Next Best Action, early financial stress.
Connectors
The platform ships native connectors for common data sources in the Brazilian market:
- Receivables registries — CERC, B3, TAG, CIP.
- Real-time payments, wire transfers, bills, cards — via ERP, core banking or aggregator.
- Open Finance — via authorized initiator API.
- Documents — indentures, fund regulation, contracts, credit notes, receipts.
- Onboarding — Federal Revenue, Central Bank, OFAC, sanctions and PEP.
Datasets
Datasets are versioned in Iceberg or Delta format, partitioned by load date, with auditable lineage. Each dataset carries a declared schema, observed quality and the regulation snapshot associated with it when applicable.
Ingestion & quality
Ingestion supports batch and streaming. Quality checks include accounting reconciliation, duplicate detection, mandatory-field completeness and referential consistency. Quality events feed the governance dashboard and can automatically block use for training or inference.
Training
Foundation model training combines self-supervised pretraining on large volumes of unlabeled data and supervised post-training on specific tasks. Execution runs on Vertex AI, with TPU v5p and Trillium pools for pretraining and NVIDIA A100/H100/H200 for fine-tuning and specific tasks.
Observed configurations
| Architecture | Time (s) | VRAM (GiB) | Use |
|---|---|---|---|
| GraphSAGE | 216.7 | 2.4 | AML production |
| GATv2 | 241.7 | 17.5 | Comparison |
| GINE | 220.5 | 7.9 | Comparison |
| MLP baseline | 55.9 | 0.7 | Baseline |
Per-client adaptation
Each institution receives a dedicated adapter trained via LoRA, QLoRA or DoRA on top of the base LFDM. The adapter captures the particularities of the data and institutional policy without exposing training data to other tenants. Versioning, registration and environment promotion via Vertex AI Model Registry, with lineage integrated to Dataplex.
Benchmark
The platform maintains an internal benchmark for comparing architectures and checkpoints. Reported metrics include AUC-ROC, Average Precision, Recall@k, Lift@k and IPI@k for ranking tasks; AUC-ROC and KS for scoring tasks; coverage and purity for case extraction.
Inference
Inference is served on serverless pay-per-query endpoints, always warm, over Cloud Run, GKE and Vertex AI Endpoints. The runtime uses vLLM or SGLang for generative models, TensorRT-LLM for high-throughput cases, and custom containers for GraphSAGE and tabular models.
Observed performance
- Transaction ranking (AML): 740K edges/second on NVIDIA A100 80GB, measured on AMLSim 124M transactions.
- Other latency and throughput metrics per agent are reported in the internal benchmark and made available on technical request for each delivery mode.
API
REST API with OAuth 2.0 authentication and optional mTLS. All calls return a decision ID and a pointer to the corresponding auditable dossier.
curl -X POST https://api.brikz.io/v1/agents/fidc/decide \
-H "Authorization: Bearer $TOKEN" \
-d '{"fund_id":"FIDC-2025-0042","event":"assignment"}'
Audit trail
Every platform decision is recorded with generated rationale, model version, adapter version, applied regulation snapshot, execution identifier and requester identity. Trail is exportable in a format compatible with internal and external audit.
Security & isolation
Data resides in the dedicated São Paulo region on Google Cloud. Confidential Compute isolates each tenant's execution environment cryptographically. Access controlled by Cloud IAM with per-operation auditing in Cloud Audit Logs. Encryption in transit and at rest by default, with optional CMEK.
Applicable regulation
- CVM 175 — credit fund operations, eligibility, collateral and monitoring duties.
- CVM 50 — anti-money laundering and counter-terrorism financing.
- BCB 119 — risk management and cybersecurity for financial institutions.
- LGPD — Brazilian general data protection law.
- COAF — atypical operations communication within 24h.
Glossary
| Term | Definition |
|---|---|
| LFDM | Large Financial Data Model. Brikz's foundation model family. |
| GraphSAGE | GNN architecture based on sample + aggregate over neighborhood. |
| Recall@k | Fraction of positives retrieved when inspecting the top-k% of the ranking. |
| IPI@k | Inspections per illicit in the top-k. Measures operational effort. |
| LoRA | Low-Rank Adaptation. Efficient model adaptation technique. |
| FIDC | Brazilian receivables-based investment fund. |
| UBO | Ultimate Beneficial Owner of an ownership structure. |