Data Engineering & ML Tools

We know it can be tricky to turn research code into a user-facing product, and we're happy to share our product expertise, tricks, and lessons learned with our clients.

Ensign

A cloud-native, cloud agnostic database designed for data teams to prototype and deploy data products and services. Ensign makes it easy to ingest structured and unstructured data, build and analyze corpora, and responsibly manage and monitor data sets/sources that power LLMs and models.

Ensign

  • Easily ingest structured or unstructured data from any source system or data set using our Python or Go SDKs
  • Build a custom corpora or data set for domain-specific LLMs or models
  • Automated deduplication and anomaly detection
  • Conduct data transformations for downstream models or consumers
  • No complex cloud vendor accounts or configurations required (or surprise bills)
  • Hosted or self-hosted versions available

Yellowbrick

Yellowbrick is an open source diagnostic visualization tool for machine learning that allows data scientists to steer the model selection process. Yellowbrick has been incorporated into data science workflows across companies, federal agencies, and individual data scientists. It reaches an international audience and averages 60,000+ downloads/month.

  • Extends the Scikit-Learn API to make model selection and hyperparameter tuning easier
  • Includes production-ready visualizers for features, classification, regression, clustering, model selection, and text
  • Commonly used inside of a Jupyter Notebook alongside Pandas data frames
  • Requires scikit-learn and matplotlib libraries