Skip to main content

Sample Documents for Fraud Detection Testing

This directory contains sample PDF documents for testing the fraud detection system. The documents are divided into two categories: legitimate and fraudulent.

Legitimate Documents

These documents represent normal, legitimate financial transactions with realistic metadata and content:

1. legitimate_invoice_1.pdf

  • Type: Invoice
  • Vendor: TechSupply Solutions Inc.
  • Amount: $2,924.08
  • Description: Standard software and support invoice with realistic line items, proper metadata, and reasonable amounts
  • Expected Risk Score: LOW (0-30)

2. legitimate_receipt_1.pdf

  • Type: Receipt
  • Vendor: Office Supplies Plus
  • Amount: $93.12
  • Description: Office supplies purchase receipt with normal amounts and proper formatting
  • Expected Risk Score: LOW (0-30)

3. legitimate_bank_statement_1.pdf

  • Type: Bank Statement
  • Account: ****5678
  • Description: Monthly bank statement with typical transactions and proper formatting
  • Expected Risk Score: LOW (0-30)

Fraudulent Documents

These documents contain specific fraud indicators designed to test different fraud detection capabilities:

1. fraudulent_tampered_invoice.pdf

  • Type: Invoice (Metadata Fraud)
  • Vendor: QuickCash Enterprises LLC (blacklisted)
  • Amount: $23,500.00
  • Fraud Indicators:
    • Metadata Tampering: PDF creation date is recent but invoice claims to be from 6 months ago
    • Blacklisted Vendor: Vendor name appears in the blacklist database
    • Suspicious Payment Terms: "Due immediately" with wire transfer preference
    • High Amount: Unusually high for consulting services
  • Expected Risk Score: HIGH/CRITICAL (70-100)
  • Tests Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 6.1

2. fraudulent_duplicate_invoice.pdf

  • Type: Invoice (Pattern Fraud)
  • Vendor: Global Services International
  • Amount: $11,800.00
  • Fraud Indicators:
    • Duplicate Invoice Number: Uses same invoice number as legitimate_invoice_1.pdf (INV-2024-001234)
    • All Rounded Amounts: Every line item and total ends in .00 (suggests fabrication)
    • Sequential Pattern: Item codes follow suspicious sequential pattern
    • No Tax: 0% tax rate is unusual
  • Expected Risk Score: HIGH (70-85)
  • Tests Requirements: 4.1, 4.2, 4.3, 4.4

3. fraudulent_outlier_receipt.pdf

  • Type: Receipt (Anomaly Fraud)
  • Vendor: Office Supplies Plus (legitimate vendor)
  • Amount: $13,625.00
  • Fraud Indicators:
    • Statistical Outliers: Prices are 100x normal (e.g., $2,500 for copy paper vs normal $16)
    • Weekend Transaction: Processed on Sunday when store is typically closed
    • Large Cash Payment: $13,625 in cash is highly unusual
    • Unknown Cashier: No proper cashier identification
    • Amount Anomaly: Total is 146x higher than normal for this vendor
  • Expected Risk Score: CRITICAL (85-100)
  • Tests Requirements: 5.1, 5.2, 5.3, 5.4

Generating Sample Documents

The sample documents can be regenerated using the provided Python scripts:

# Generate legitimate documents
./venv/bin/python generate_samples.py

# Generate fraudulent documents
./venv/bin/python generate_fraudulent_samples.py

Testing with Sample Documents

Upload documents to test the fraud detection system:

# Upload a legitimate document
./upload-document.sh sample-files/legitimate_invoice_1.pdf

# Upload a fraudulent document
./upload-document.sh sample-files/fraudulent_tampered_invoice.pdf

Expected Fraud Assessment Output

Legitimate Document Example

{
"risk_score": 15,
"risk_level": "LOW",
"findings": {
"metadata_analysis": {
"suspicious_indicators": []
},
"pattern_matches": [],
"anomalies": [],
"database_checks": {
"blacklist_status": {
"is_blacklisted": false
},
"verification_status": {
"is_verified": true
}
}
},
"indicators": [],
"recommended_actions": ["Process normally", "No further review required"]
}

Fraudulent Document Example

{
"risk_score": 92,
"risk_level": "CRITICAL",
"findings": {
"metadata_analysis": {
"suspicious_indicators": [
{
"type": "timestamp_mismatch",
"description": "Document creation date conflicts with invoice date",
"severity": "high"
}
]
},
"pattern_matches": [
{
"pattern_type": "duplicate_invoice",
"confidence": 0.95
}
],
"anomalies": [
{
"type": "amount_outlier",
"severity_score": 9
}
],
"database_checks": {
"blacklist_status": {
"is_blacklisted": true,
"reason": "Known fraudulent entity"
}
}
},
"indicators": [
"Metadata tampering detected",
"Vendor on blacklist",
"Duplicate invoice number",
"Statistical anomalies present"
],
"recommended_actions": [
"REJECT payment immediately",
"Flag for investigation",
"Report to fraud prevention team",
"Contact vendor verification"
]
}

Notes

  • All documents are generated programmatically using ReportLab and PyPDF2
  • Fraudulent documents are designed to trigger specific fraud detection tools
  • The metadata in fraudulent documents is intentionally suspicious
  • Amounts and patterns in fraudulent documents are based on real-world fraud scenarios