ML Engineer Specializing in Generative AI
Login to Send EmailDescription
Ideally, I’d find a job doing ML engineering on text-to-image, text-to-video, text-to-audio or related models - recently I’ve built a text-to-image model that is trained with unlabeled images alone, using CLIP for the link between captions and images. I’m interested in ML engineering in other domains as well, and my last job was building a new cryptocurrency, so I have skills there too. Recently, I’ve built a text-to-image model that is trained without any text labels, using unlabeled images and CLIP for the link between captions and images. This has never been done or even investigated before. Results are promising so far. I think this work is the best representation of what I’m capable of. In the process, I gathered a dataset of 33 million images for training data, including removing redundant images, deduplicating, and taking stills from any videos. I ported a VQGAN implementation from PyTorch to JAX, built an efficient preprocessing pipeline, built transformer models in JAX, and designed and trained baseline models and more sophisticated ones. To support the approach I eventually settled on, I designed and implemented an efficient algorithm to sample unit vectors from a finite set, conditioned on the vectors being inside a spherical cap. For that I needed to write a Python library in Rust to help with constructing the space partitioning data structure used for sampling. The sampling algorithm gets used to generate training examples and the model learns to sample images conditioned on the image's CLIP embedding being within an arbitrary spherical cap.