Data Foundation
7 source tables mapped

Table Catalog

agent_family_info.csv

10 rows

Track family relationships for collusion analysis

Primary keys: sales_id

Risk focus: Family clusters linked to high-risk policy portfolios

agent_fraud_label.csv

6 rows

Ground-truth labels for supervised training and review tracking

Primary keys: salesid, nameid

Risk focus: Target variable quality and pending investigations

collection_filter.csv

96 rows

Payment collection behavior and payer identity metadata

Primary keys: receipt_no

Risk focus: Missing payer identity and unusual collection patterns

loan_info.csv

12 rows

Policy loan transactions with agent involvement indicators

Primary keys: loan_no

Risk focus: Large loan amounts and POA/agent abuse signals

next_due_filter.csv

48 rows

Upcoming premium schedule and expected payment amount

Primary keys: policy_no

Risk focus: Overdue next-due behavior and delinquency windows

policy_info_filter.csv

48 rows

Master policy profile, status, premium, and agent ownership

Primary keys: policy_no

Risk focus: Policy lifecycle anomalies and premium integrity checks

status_filter.csv

48 rows

Month-over-month status and payment behavior tracking

Primary keys: policy_no

Risk focus: Rapid status flips and payment timeline instability

Cross-table Relationships

  • agent_fraud_label.salesid -> agent_family_info.sales_id
  • agent_fraud_label.salesid -> policy_info_filter.sales_id
  • agent_fraud_label.salesid -> next_due_filter.sales_id
  • collection_filter.policy_no -> policy_info_filter.policy_no
  • loan_info.policy_no -> policy_info_filter.policy_no
  • next_due_filter.policy_no -> policy_info_filter.policy_no
  • status_filter.policy_no -> policy_info_filter.policy_no

Fraud Detection Business Rules (Design)

  • 1. Rapid policy status changes with short interval from In-force to Lapsed
  • 2. Missing payer identity on high premium collection receipts
  • 3. High loan-to-premium ratio when agent involvement is marked Y
  • 4. Family-linked agent network with concentrated risky policies
  • 5. Overdue next-due schedule without aligned payment evidence

Feature Engineering Blueprint

  • Temporal: days since last pay, days to next due, status-change interval
  • Ratio: loan_to_premium, premium_consistency_ratio, overdue_to_payment_mode
  • Network: family_connection_count, family_fraud_density, shared payer patterns
  • Behavioral: payer identity missing rate, renewal disruption count, status volatility