Frequently Asked Questions

This section provides answers to frequently asked questions about the GenAI IDP Accelerator CDK implementation.

General Questions

What is the GenAI IDP Accelerator?

The GenAI Intelligent Document Processing (IDP) Accelerator is a comprehensive solution for transforming unstructured documents into structured data using AWS's AI/ML services. It provides a modular, customizable approach to document processing workflows.

What is the difference between the original GenAI IDP Accelerator and this CDK implementation?

The original GenAI IDP Accelerator is implemented as a CloudFormation template, while this project is a modular AWS CDK implementation. The CDK implementation provides more flexibility for customization and integration with existing infrastructure.

What AWS services does the GenAI IDP Accelerator use?

The GenAI IDP Accelerator uses a variety of AWS services, including:

Amazon S3 for document storage
AWS Lambda for serverless processing
AWS Step Functions for workflow orchestration
Amazon Bedrock for generative AI capabilities
Amazon Textract for OCR and basic extraction
Amazon Comprehend for entity recognition
Amazon SageMaker for custom ML models
Amazon DynamoDB for metadata storage

What document types are supported?

The GenAI IDP Accelerator supports a wide range of document types, including:

PDF documents
Image files (JPEG, PNG, TIFF)
Microsoft Office documents (Word, Excel)
Text files

The specific document types supported may vary depending on the processing pattern used.

Technical Questions

What is AWS CDK?

AWS Cloud Development Kit (CDK) is an open-source software development framework for defining cloud infrastructure as code using familiar programming languages. The GenAI IDP Accelerator uses TypeScript for its CDK implementation.

What are the different processing patterns?

The GenAI IDP Accelerator supports three main processing patterns:

Pattern 1: Uses Amazon Bedrock Data Automation for document processing with minimal custom code
Pattern 2: Implements custom extraction logic using Amazon Bedrock foundation models
Pattern 3: Utilizes custom SageMaker endpoints for specialized document processing tasks

How do I choose the right processing pattern?

The choice of processing pattern depends on your specific requirements:

Pattern 1 is ideal for standard document types with well-defined structures
Pattern 2 provides more flexibility for complex document formats
Pattern 3 is best for specialized document processing tasks or when you need to use custom ML models

Can I customize the processing workflow?

Yes, the GenAI IDP Accelerator is designed to be customizable. You can:

Modify the Step Functions workflow
Add custom Lambda functions
Integrate with additional AWS services
Implement custom document processing logic

How does the solution scale?

The GenAI IDP Accelerator is built on serverless AWS services, which automatically scale based on demand. The solution can handle from a few documents to millions of documents without manual scaling.

What are the security considerations?

The GenAI IDP Accelerator implements AWS security best practices, including:

Least privilege IAM roles
Encryption of data at rest and in transit
VPC isolation for sensitive components
AWS WAF integration for web interface protection
CloudTrail logging for audit and compliance

Deployment Questions

What are the prerequisites for deployment?

To deploy the GenAI IDP Accelerator, you need:

An AWS account with appropriate permissions
Node.js and npm/yarn installed
AWS CDK CLI installed
Docker for building Lambda functions
Python for certain components

How do I deploy the solution?

You can deploy the solution using the AWS CDK CLI:

# Navigate to the sample directory
cd samples/sample-bda-lending

# Deploy the sample
yarn deploy

How much does it cost to run?

The cost of running the GenAI IDP Accelerator depends on your usage of AWS services. The solution uses serverless services that scale with usage, so you only pay for what you use.

Key cost factors include:

Number of documents processed
Size and complexity of documents
AI/ML services used (Bedrock, Textract, etc.)
Storage requirements

How do I monitor the solution?

The GenAI IDP Accelerator includes CloudWatch dashboards and alarms for monitoring:

Document processing metrics
Error rates
Processing latency
Service health

You can also use AWS X-Ray for tracing and AWS CloudTrail for audit logging.

Troubleshooting

Common Deployment Issues

CDK Bootstrap Error

If you encounter a CDK bootstrap error:

# Bootstrap your AWS environment
cdk bootstrap aws://ACCOUNT-NUMBER/REGION

Missing Dependencies

If you encounter missing dependencies:

# Ensure all dependencies are installed
yarn install

# Rebuild the project
yarn build

Permission Issues

If you encounter permission issues:

# Check your AWS credentials
aws sts get-caller-identity

# Ensure your IAM user has the necessary permissions

.NET NuGet Cache Issues

If you encounter build failures when working with .NET redistributable packages, particularly interface compatibility issues between @cdklabs/genai-idp and processor packages like @cdklabs/genai-idp-bda-processor, this may be due to NuGet caching an older version of the packages.

Common scenarios: - After updating package versions in your project - When switching between different branches or versions of the codebase - After pulling updates that include package version changes

Symptoms: - Build errors related to interface mismatches - Type compatibility issues between core and processor packages - Errors indicating that a class doesn't implement required interfaces - Messages like "does not implement interface member" despite correct code

Solution: Clear the NuGet cache to ensure you're using the latest package versions:

# Clear all NuGet caches (recommended)
dotnet nuget locals all --clear

# Then rebuild your project
dotnet restore
dotnet build

Alternative approaches:

# Clear specific cache types if you prefer
dotnet nuget locals http-cache --clear
dotnet nuget locals global-packages --clear
dotnet nuget locals temp --clear

# For persistent issues, also try clearing the project's bin and obj folders
rm -rf bin/ obj/
dotnet restore
dotnet build

This issue typically occurs when: - Package versions have been updated but cached versions persist - Multiple versions of related packages exist in the cache - Interface definitions have changed between package versions - NuGet resolves to cached packages instead of the latest versions specified in your project file

Runtime Issues

Document Processing Failures

If documents fail to process:

Check the document format and quality
Verify that the document type is supported
Check CloudWatch Logs for error messages
Ensure the AI services have the necessary permissions

Performance Issues

If you experience performance issues:

Monitor Lambda concurrency limits
Check for throttling on AWS services
Consider batching documents for processing
Optimize Lambda function memory allocation

Support and Resources

Where can I get help?

If you need help with the GenAI IDP Accelerator:

Check the documentation in this site
Review the sample applications
Open an issue on the GitLab repository
Contact AWS Support if you have an AWS Support plan

How can I contribute?

Contributions to the GenAI IDP Accelerator are welcome! See the Contributing section for details on how to contribute.

Where can I find more resources?

Additional resources: