Banking Regex


A director at a major bank contacted us a few days into an operation they were having after they began facing some difficulties. FATCA reporting standards had recently changed, and the bank was on a strict timeline to deliver their platform for compliance. This particular department deals with clients and corporations with over $5 million in holdings, and the repercussions for missing the deadline would be withheld funds and tax implications that could severely detriment their clients.

Close to a certain deadline, the software team pushed an update meant to begin parsing over records, which the bank has to retain for 7 years. This parser had a fatal flaw though; folders, subfolders and files that contained certain special characters, (* _ + -), would cause the program to fail. The director had immediately contacted the software team, which announced it would take them far past the deadline to release a fix, and the directors team would simply have to find their own solution.

The director began solving the issue by using company resources, consisting of highly skilled BA’s, QA’s and SME’s, and set them on the task. Unfortunately, this meant manually going through the folders and files and renaming them to remove offending characters. This brute force method would in theory work, and a team of 14 people had been at the task for 3 days before Automation Toronto was contacted. They were able to clear approximately 60 000+ records, a truly impressive feat. This industrious accomplishment was undermined by a daunting fact: more than 400 000 records remained. The team was restless and hoping for a programmatic solution.


Before we’d transform data, we need to ensure we have a backup. The database was quite extensive, with each record variable in size, but it was backed up to a RAID 5 array, which allowed for a fast transfer while keeping a fair level of redundancy.

After the data integrity of the backup was verified, we created a simple shell script, which followed the pattern required with a regular expression, replacing all offending characters with an empty space.

The software teams parser then ran without error.


On receiving the call, we established we’d be dealing with a database. For safety, we always back up the clients data, wiping, destroying or handing over the drives after the job is complete to the satisfaction of the customer.

The solution parsed over all 460 000+ records in just over 4 minutes. These calculations were taken from the final report:

    Brute Force Rate:
    (60000 records)/(14 people)/(3 days) = ~4285 records/person/day

    Project completion time brute force:
    (460000 records)/(4285 records)/person/day*(14 people) = ~8 days

    Cost for brute force:
    (14 people)*(~$37.5 average hourly rate)*(7.5 hours)*(5 days) = ~$19600

    Automation Toronto Cost:
    (13 hours)*($400/hour rush rate)+($350 hard drives)= $5550

    ~$19600-$5550 = ~$14050

The hourly rate included sourcing the RAID controller and the database backup. This rate varies depending on urgency,