Data Analyst and Developer

Login to Send Email

Description

PHP JavaScript Linux bash git CSS node.js Tableau GraphViz AWS VMware MySQL Python HTML Docker VNC R

I have been analysing data in one way or another pretty much my entire career. Much of the time that takes the form of writing custom code to extract data, then to clean it, then to store it, then to analyse it.

I am a polyglot ICT professional and my resume is formatted as a list of work experience. As an example, for a recent large project I was contracted to analyse a large obfuscated codebase in active development.

The brief: Analyse a JavaScript library which may be subject to legal action.

As the sole developer and data analyst under contract, I determined that the library consisted of about 300k lines of obfuscated code that was in wide use and under active development. After I attempted initial static analysis, I built a tool to capture the library and log its activity whilst it was running (ie. during run-time). The library under investigation actively monitored user activity and resisted simple screen-shot attempts. The capture and logging tool I developed simulated user interaction, scrolled pages, clicked on links and logged all activity including dynamic DOM changes made by the library. The tool also captured Chrome HAR files. I used the source-code for Chrome to discover and address several messaging and logging edge-cases.

The capture and logging tool I wrote in Node.js, used Chrome and puppeteer, and was implemented on AWS EC2 using Docker images I built from scratch, published to, and pulled from AWS ECR. Logs were stored on AWS S3. I deployed the tool across random AWS regions using AWS CloudFormation and bash.

For initial analysis, I generated Graphviz maps using Python with thousands of nodes and edges showing the URL calls being used and their relationships. I conducted in-depth analysis of over a TB of logging data with customised de-obfuscation tools that I built to use abstract syntax tree filters to generate names using four letter English words to enable the tracking of function and variable names. I mapped function calls, reverse engineered and mapped (mostly undocumented) Chrome HAR files. I also traced variable values across the codebase, dynamically inserting debugging statements during run-time.

Customised versions of the library were also injected at run-time to discover further interactions of the codebase with other "cooperating" libraries. I also mapped the evolution of the library codebase itself using git, by creating commits for each discovered and de-obfuscated version. I used PHP to download historic versions of the codebase from archive.org. I traced and documented small and large changes.

I documented the entire project, under the expectation of showing evidence in court.

Technologies: Node.js, Puppeteer, Chrome, Docker, bash, Ubuntu, AWS CloudFormation, AWS EC2, AWS ECR, AWS S3, Graphviz, Python, PHP, grep, sed, awk, git, json, yaml, abstract syntax trees