Skip to content

@cdklabs/genai-idp-bda-processor

Constructs

BdaProcessor

BDA document processor facade over UnifiedDocumentProcessor.

Creates BDA blueprints and a Data Automation Project from the configuration's class definitions at CDK synth time, then delegates all processing to the unified processor with use_bda: true.

Initializers

import { BdaProcessor } from '@cdklabs/genai-idp-bda-processor'

new BdaProcessor(scope: Construct, id: string, props: BdaProcessorProps)
Name Type Description
scope constructs.Construct No description.
id string No description.
props BdaProcessorProps No description.

scopeRequired
  • Type: constructs.Construct

idRequired
  • Type: string

propsRequired

Methods

Name Description
toString Returns a string representation of this construct.
with Applies one or more mixins to this construct.
metricBDAJobsFailed Failed BDA async jobs.
metricBDAJobsSucceeded Successful BDA async jobs.
metricBDAJobsTotal Total BDA async jobs submitted.
metricBDARequestsFailed Failed BDA invocation requests.
metricBDARequestsLatency BDA single-request latency in milliseconds.
metricBDARequestsMaxRetriesExceeded BDA requests that exceeded max retries.
metricBDARequestsNonRetryableErrors BDA non-retryable errors.
metricBDARequestsRetrySuccess BDA requests that succeeded after retry.
metricBDARequestsSucceeded Successful BDA invocation requests.
metricBDARequestsThrottles BDA request throttles.
metricBDARequestsTotal Total BDA Data Automation invocation requests.
metricBDARequestsTotalLatency BDA total latency including retries in milliseconds.
metricBDARequestsUnexpectedErrors BDA unexpected errors.
metricBedrockRequestsFailed Failed Bedrock model invocation requests.
metricBedrockRequestsSucceeded Successful Bedrock model invocation requests.
metricBedrockRequestsTotal Total Bedrock model invocation requests (summarization, evaluation).
metricHITLTriggered Documents flagged for human-in-the-loop review.
metricProcessedCustomPages Custom blueprint pages processed.
metricProcessedDocuments Documents processed by BDA.
metricProcessedPages Total pages processed.
metricProcessedStandardPages Standard pages processed.

toString
public toString(): string

Returns a string representation of this construct.

with
public with(mixins: ...IMixin[]): IConstruct

Applies one or more mixins to this construct.

Mixins are applied in order. The list of constructs is captured at the start of the call, so constructs added by a mixin will not be visited. Use multiple with() calls if subsequent mixins should apply to added constructs.

mixinsRequired
  • Type: ...constructs.IMixin[]

The mixins to apply.


metricBDAJobsFailed
public metricBDAJobsFailed(props?: MetricOptions): Metric

Failed BDA async jobs.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDAJobsSucceeded
public metricBDAJobsSucceeded(props?: MetricOptions): Metric

Successful BDA async jobs.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDAJobsTotal
public metricBDAJobsTotal(props?: MetricOptions): Metric

Total BDA async jobs submitted.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsFailed
public metricBDARequestsFailed(props?: MetricOptions): Metric

Failed BDA invocation requests.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsLatency
public metricBDARequestsLatency(props?: MetricOptions): Metric

BDA single-request latency in milliseconds.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsMaxRetriesExceeded
public metricBDARequestsMaxRetriesExceeded(props?: MetricOptions): Metric

BDA requests that exceeded max retries.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsNonRetryableErrors
public metricBDARequestsNonRetryableErrors(props?: MetricOptions): Metric

BDA non-retryable errors.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsRetrySuccess
public metricBDARequestsRetrySuccess(props?: MetricOptions): Metric

BDA requests that succeeded after retry.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsSucceeded
public metricBDARequestsSucceeded(props?: MetricOptions): Metric

Successful BDA invocation requests.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsThrottles
public metricBDARequestsThrottles(props?: MetricOptions): Metric

BDA request throttles.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsTotal
public metricBDARequestsTotal(props?: MetricOptions): Metric

Total BDA Data Automation invocation requests.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsTotalLatency
public metricBDARequestsTotalLatency(props?: MetricOptions): Metric

BDA total latency including retries in milliseconds.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBDARequestsUnexpectedErrors
public metricBDARequestsUnexpectedErrors(props?: MetricOptions): Metric

BDA unexpected errors.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBedrockRequestsFailed
public metricBedrockRequestsFailed(props?: MetricOptions): Metric

Failed Bedrock model invocation requests.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBedrockRequestsSucceeded
public metricBedrockRequestsSucceeded(props?: MetricOptions): Metric

Successful Bedrock model invocation requests.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricBedrockRequestsTotal
public metricBedrockRequestsTotal(props?: MetricOptions): Metric

Total Bedrock model invocation requests (summarization, evaluation).

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricHITLTriggered
public metricHITLTriggered(props?: MetricOptions): Metric

Documents flagged for human-in-the-loop review.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricProcessedCustomPages
public metricProcessedCustomPages(props?: MetricOptions): Metric

Custom blueprint pages processed.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricProcessedDocuments
public metricProcessedDocuments(props?: MetricOptions): Metric

Documents processed by BDA.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricProcessedPages
public metricProcessedPages(props?: MetricOptions): Metric

Total pages processed.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

metricProcessedStandardPages
public metricProcessedStandardPages(props?: MetricOptions): Metric

Standard pages processed.

propsOptional
  • Type: aws-cdk-lib.aws_cloudwatch.MetricOptions

Static Functions

Name Description
isConstruct Checks if x is a construct.

isConstruct
import { BdaProcessor } from '@cdklabs/genai-idp-bda-processor'

BdaProcessor.isConstruct(x: any)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

xRequired
  • Type: any

Any object.


Properties

Name Type Description
node constructs.Node The tree node.
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
project IDataAutomationProject The BDA Data Automation Project used by this processor.
stateMachine aws-cdk-lib.aws_stepfunctions.IStateMachine The Step Functions state machine that orchestrates the document processing workflow.
evaluationFunction @cdklabs/genai-idp.EvaluationFunction The evaluation function if evaluation is enabled for this processor.

nodeRequired
public readonly node: Node;
  • Type: constructs.Node

The tree node.


environmentRequired
public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


maxProcessingConcurrencyRequired
public readonly maxProcessingConcurrency: number;
  • Type: number

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


projectRequired
public readonly project: IDataAutomationProject;

The BDA Data Automation Project used by this processor.


stateMachineRequired
public readonly stateMachine: IStateMachine;
  • Type: aws-cdk-lib.aws_stepfunctions.IStateMachine

The Step Functions state machine that orchestrates the document processing workflow.

Manages the sequence of processing steps and handles error conditions. This state machine is triggered for each document that needs processing and coordinates the entire extraction pipeline.


evaluationFunctionOptional
public readonly evaluationFunction: EvaluationFunction;
  • Type: @cdklabs/genai-idp.EvaluationFunction

The evaluation function if evaluation is enabled for this processor.

The evaluation function is created by the ProcessingEnvironment when evaluation baseline bucket and model are provided.


Structs

BdaProcessorConfigurationDefinitionOptions

Options for configuring the BDA processor configuration definition.

Allows customization of evaluation and summarization models. BDA handles OCR, classification, extraction, and assessment internally, so those options are not exposed.

Initializer

import { BdaProcessorConfigurationDefinitionOptions } from '@cdklabs/genai-idp-bda-processor'

const bdaProcessorConfigurationDefinitionOptions: BdaProcessorConfigurationDefinitionOptions = { ... }

Properties

Name Type Description
evaluationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for the evaluation stage.
summarizationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for the summarization stage.

evaluationModelOptional
public readonly evaluationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for the evaluation stage.

Defines the model used for evaluating extraction accuracy.


summarizationModelOptional
public readonly summarizationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for the summarization stage.

Defines the model used for generating document summaries.


BdaProcessorProps

Configuration properties for the BDA document processor facade.

Initializer

import { BdaProcessorProps } from '@cdklabs/genai-idp-bda-processor'

const bdaProcessorProps: BdaProcessorProps = { ... }

Properties

Name Type Description
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
configuration IBdaProcessorConfiguration Configuration for the BDA document processor.
configurationBucket aws-cdk-lib.aws_s3.IBucket The S3 bucket containing configuration files.

environmentRequired
public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


maxProcessingConcurrencyOptional
public readonly maxProcessingConcurrency: number;
  • Type: number
  • Default: 100 concurrent workflows

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


configurationRequired
public readonly configuration: IBdaProcessorConfiguration;

Configuration for the BDA document processor.

The use_bda: true flag is forced automatically.


configurationBucketRequired
public readonly configurationBucket: IBucket;
  • Type: aws-cdk-lib.aws_s3.IBucket

The S3 bucket containing configuration files.


Classes

BdaProcessorConfiguration

Configuration management for BDA document processing using Bedrock Data Automation.

This construct creates and manages the configuration for BDA document processing, including schema definitions and configuration values. It provides a centralized way to manage extraction schemas, evaluation settings, and summarization parameters.

Initializers

import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

new BdaProcessorConfiguration(definition: IBdaProcessorConfigurationDefinition)
Name Type Description
definition IBdaProcessorConfigurationDefinition The configuration definition instance.

definitionRequired

The configuration definition instance.


Methods

Name Description
bind Binds the configuration to a processor instance.

bind
public bind(scope: Construct, environment: IProcessingEnvironment, bdaProjectArn?: string): IBdaProcessorConfigurationDefinition

Binds the configuration to a processor instance.

Creates a custom resource that writes the default configuration to the configuration table.

scopeRequired
  • Type: constructs.Construct

The construct scope for creating custom resources.


environmentRequired
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment providing the configuration function and table.


bdaProjectArnOptional
  • Type: string

Optional BDA project ARN to store alongside the config.


Static Functions

Name Description
docSplit Creates a configuration for document splitting.
fromFile Creates a configuration from a YAML file.
lendingPackageSample Creates a configuration for lending package processing.
lendingPackageSampleGovCloud Creates a minimal configuration for GovCloud deployments.
ocrBenchmark Creates a configuration for OCR benchmarking.
realkieFccVerified Creates a configuration for RealKIE FCC verified documents.
rvlCdip Creates a configuration for RVL-CDIP document classification.

docSplit
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.docSplit(options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration for document splitting.

This configuration focuses on splitting multi-document files into individual documents for processing.

optionsOptional

Optional configuration options.


fromFile
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.fromFile(filePath: string, options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration from a YAML file.

filePathRequired
  • Type: string

Path to the YAML configuration file.


optionsOptional

Optional configuration options to override file settings.


lendingPackageSample
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.lendingPackageSample(options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration for lending package processing.

This configuration includes full class definitions and extraction schemas.

optionsOptional

Optional configuration options.


lendingPackageSampleGovCloud
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.lendingPackageSampleGovCloud(options?: BdaProcessorConfigurationDefinitionOptions)

Creates a minimal configuration for GovCloud deployments.

This configuration demonstrates the "minimal override" pattern where only GovCloud-compatible model IDs are specified, and all other settings (classes, prompts, etc.) are inherited from system defaults at runtime.

This approach is useful when you want to: - Use system default class definitions - Only override region-specific settings (like model IDs) - Keep your config file minimal and maintainable

optionsOptional

Optional configuration options.


ocrBenchmark
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.ocrBenchmark(options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration for OCR benchmarking.

This configuration is designed for evaluating OCR performance across different document types and quality levels.

optionsOptional

Optional configuration options.


realkieFccVerified
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.realkieFccVerified(options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration for RealKIE FCC verified documents.

This configuration is optimized for processing FCC-verified documents from the RealKIE dataset.

optionsOptional

Optional configuration options.


rvlCdip
import { BdaProcessorConfiguration } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfiguration.rvlCdip(options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration for RVL-CDIP document classification.

This configuration is designed for the RVL-CDIP dataset, which contains 16 classes of document images for classification tasks.

optionsOptional

Optional configuration options.


Properties

Name Type Description
definition IBdaProcessorConfigurationDefinition The configuration definition instance.

definitionRequired
public readonly definition: IBdaProcessorConfigurationDefinition;

The configuration definition instance.


BdaProcessorConfigurationDefinition

Configuration definition for BDA document processing.

Loads configuration from the unified config library and forces use_bda: true. Maps BDA-specific options (summarizationModel, evaluationModel) to the unified configuration definition options.

Initializers

import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

new BdaProcessorConfigurationDefinition()
Name Type Description

Static Functions

Name Description
docSplit Document splitting preset with use_bda: true.
fromFile Creates a configuration definition from a custom YAML file with use_bda: true.
lendingPackageSample Lending package sample preset with use_bda: true.
lendingPackageSampleGovCloud Lending package sample for GovCloud with use_bda: true.
ocrBenchmark OCR benchmark preset with use_bda: true.
realkieFccVerified RealKIE FCC verified preset with use_bda: true.
rvlCdip RVL-CDIP classification preset with use_bda: true.

docSplit
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.docSplit(options?: BdaProcessorConfigurationDefinitionOptions)

Document splitting preset with use_bda: true.

optionsOptional

fromFile
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.fromFile(filePath: string, options?: BdaProcessorConfigurationDefinitionOptions)

Creates a configuration definition from a custom YAML file with use_bda: true.

filePathRequired
  • Type: string

optionsOptional

lendingPackageSample
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.lendingPackageSample(options?: BdaProcessorConfigurationDefinitionOptions)

Lending package sample preset with use_bda: true.

optionsOptional

lendingPackageSampleGovCloud
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.lendingPackageSampleGovCloud(options?: BdaProcessorConfigurationDefinitionOptions)

Lending package sample for GovCloud with use_bda: true.

optionsOptional

ocrBenchmark
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.ocrBenchmark(options?: BdaProcessorConfigurationDefinitionOptions)

OCR benchmark preset with use_bda: true.

optionsOptional

realkieFccVerified
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.realkieFccVerified(options?: BdaProcessorConfigurationDefinitionOptions)

RealKIE FCC verified preset with use_bda: true.

optionsOptional

rvlCdip
import { BdaProcessorConfigurationDefinition } from '@cdklabs/genai-idp-bda-processor'

BdaProcessorConfigurationDefinition.rvlCdip(options?: BdaProcessorConfigurationDefinitionOptions)

RVL-CDIP classification preset with use_bda: true.

optionsOptional

Protocols

IBdaProcessor

Interface for BDA document processor implementation.

Properties

Name Type Description
node constructs.Node The tree node.
environment @cdklabs/genai-idp.IProcessingEnvironment The processing environment that provides shared infrastructure and services.
maxProcessingConcurrency number The maximum number of documents that can be processed concurrently.
stateMachine aws-cdk-lib.aws_stepfunctions.IStateMachine The Step Functions state machine that orchestrates the document processing workflow.
evaluationFunction @cdklabs/genai-idp.EvaluationFunction The evaluation function if evaluation is enabled for this processor.

nodeRequired
public readonly node: Node;
  • Type: constructs.Node

The tree node.


environmentRequired
public readonly environment: IProcessingEnvironment;
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment that provides shared infrastructure and services.

Contains input/output buckets, tracking tables, API endpoints, and other resources needed for document processing operations.


maxProcessingConcurrencyRequired
public readonly maxProcessingConcurrency: number;
  • Type: number

The maximum number of documents that can be processed concurrently.

Controls the throughput and resource utilization of the document processing system.


stateMachineRequired
public readonly stateMachine: IStateMachine;
  • Type: aws-cdk-lib.aws_stepfunctions.IStateMachine

The Step Functions state machine that orchestrates the document processing workflow.

Manages the sequence of processing steps and handles error conditions. This state machine is triggered for each document that needs processing and coordinates the entire extraction pipeline.


evaluationFunctionOptional
public readonly evaluationFunction: EvaluationFunction;
  • Type: @cdklabs/genai-idp.EvaluationFunction

The evaluation function if evaluation is enabled for this processor.

The evaluation function is created by the ProcessingEnvironment when evaluation baseline bucket and model are provided.


IBdaProcessorConfiguration

Interface for BDA document processor configuration.

Provides configuration management for Bedrock Data Automation processing.

Methods

Name Description
bind Binds the configuration to a processor instance.

bind
public bind(scope: Construct, environment: IProcessingEnvironment, bdaProjectArn?: string): IBdaProcessorConfigurationDefinition

Binds the configuration to a processor instance.

Writes the default configuration to the configuration table.

scopeRequired
  • Type: constructs.Construct

The construct scope for creating custom resources.


environmentRequired
  • Type: @cdklabs/genai-idp.IProcessingEnvironment

The processing environment providing the configuration function and table.


bdaProjectArnOptional
  • Type: string

Optional BDA project ARN to store alongside the config.


Properties

Name Type Description
definition IBdaProcessorConfigurationDefinition The configuration definition.

definitionRequired
public readonly definition: IBdaProcessorConfigurationDefinition;

The configuration definition.


IBdaProcessorConfigurationDefinition

Interface for BDA processor configuration definition.

Exposes only BDA-relevant options (summarization, evaluation).

Properties

Name Type Description
evaluationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for evaluating extraction results.
summarizationModel @aws-cdk/aws-bedrock-alpha.IBedrockInvokable Optional model for document summarization.

evaluationModelOptional
public readonly evaluationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for evaluating extraction results.


summarizationModelOptional
public readonly summarizationModel: IBedrockInvokable;
  • Type: @aws-cdk/aws-bedrock-alpha.IBedrockInvokable

Optional model for document summarization.


IDataAutomationProject

Interface representing an Amazon Bedrock Data Automation Project.

Methods

Name Description
grantInvokeAsync Grant the given identity permissions to invoke this project asynchronously.

grantInvokeAsync
public grantInvokeAsync(grantee: IGrantable): Grant

Grant the given identity permissions to invoke this project asynchronously.

granteeRequired
  • Type: aws-cdk-lib.aws_iam.IGrantable

Properties

Name Type Description
arn string The ARN of the Data Automation Project.

arnRequired
public readonly arn: string;
  • Type: string

The ARN of the Data Automation Project.