Hexa - How to Scale #Data, with Milene Darnis (Heap)

This article includes key takeaways from Milene Darnis’ talk on how to scale data. Check out the full video on Youtube or get the latest from our Scale series.

3 key mindsets

Mindset #1: Good (and less good) reasons to use data

Data isn’t always the answer. (Gasp!) For Milene, there are good reasons for businesses to leverage data, and there are misguided ones. Situations where data can help include informing roadmap decisions, identifying bugs in your product, and understanding your customers and prospects. Situations where using data won’t result in better outcomes include: justifying decisions that are already made, avoiding discussions with customers, and looking for confirmation in vanity metrics. For many people — particularly those used to making decisions from their gut — a commitment to data means major changes to how their business operates, and often a significant psychological shift. Be prepared!

Mindset #2: Start small

Milene sees this all the time: companies can’t wait to “start using AI,” but haven’t built up the infrastructure or processes that make an advanced use-case like AI viable for them. The good news is that the intervening stages still confer many benefits to your business. As a first step, work on supplying clean data to business teams: things like MRR, ARR, and other revenue metrics. No, they’re not AI, but it’s nearly impossible to run a successful business without them. Next you can start collecting product data to inform decisions about your product: roadmaps, new features, and improvements. From here, move up to consolidating business data sets to generate business-wide models for forecasting. Once these data functions are up and running, you’ll be ready to add AI and predictive features.

Mindset #3: Data prerequisites

Before launching any major data initiatives, it’s important to establish baseline standards. First, do you have enough data? Each of the steps above requires you to be collecting enough data; if you can’t reliably report on ARR, or product users, or sales, or customers, then you know your immediate tasks. Second, what are your goals? These don’t have to be fully fleshed-out, but knowing what long-term goals you’re shooting for (AI, predictive forecasts, etc.) can make sure your preliminary steps — implementing reliable systems for collecting, storing, and exploring data — are put together properly. Third, can you afford to hire data talent? This involves more than budget; if you want to retain talent, you’ll need to offer them learning opportunities and career development (i.e. a robust plan around data).

4 key best practices

Best practice #1: User segmentation & key metrics

A good first step for getting started with data is to map out key metrics and offer basic user segmentation. User segmentation involves dividing your users into groups based on relevant features — job title, actions performed in the product, platform — and can take two forms: time-based segmentation (which tells you when a user performed a specific action, like that they onboarded this week) and action-based segmentation (which tells you that a user performed an action).

Another key set of metrics captures activity at each stage of the funnel. Milene recommends the AARRR framework (aka Pirate Metrics!). Besides being fun to say, “AARRR” maps out stages of the user journey: awareness, activation, retention, revenue, referral.

Best practice #2: Building your data stack

Milene advises startups between 50 to 200 people to adopt a simple data stack, starting with product data. For this she recommends using tools like Google Analytics or Heap. To this she recommends adding sources for business data: payment data from Stripe, customer information from Salesforce, commerce data from Shopify. Last is to add a simple server-side data solution like MySQL. As a second step, Milene suggests investing in data warehouses, especially cloud-based warehouses like Snowflake or Amazon Redshift. ETL tools like Heap Connect can help aggregate and transfer data to them. Finally, a top layer of tools helps you explore and analyze this data. Milene recommends tools like Looker and Tableau for a more BI, or Mode and Jupyter if you need a more data science-based approach.

Best practice #3: Hiring for data

Data is a growing field, with lots of new job titles. Milene gives some tips on what titles mean in terms of skills and what they can bring to your team:

Business analyst: proficient in excel, tracks business metrics (monthly revenue, operating costs, etc.) for business teams and investors.
Data analyst: proficient in SQL, analyzes large data sets and helps with advanced questions. Often partners with product.
Data scientist: more technical and statistic-driven, specializes in running experiments on data and developing predictive models.
Data engineer: builds and maintains data infrastructure, connects data sources, keeps data clean, creates usable schemas.
ML engineer: pushes models built by data scientists into code.

Best practice #4: Centralized vs distributed data org

The proliferation of data-based practices has caused two major changes (at least): it’s raised expectations of data-literacy across all teams, and it’s started to change the ways data teams work. Whereas it used to be common to adopt a centralized approach in which data-oriented tasks would be handled by the data team alone, this centralized structure tends to produce bottlenecks, and to deprive teams of the data they need to do their jobs.

Milene recommends a more distributed approach, in which the data team sets up and maintains the data stack, but also makes that stack both available and easy to use for teams across the business. In this model, data teams partner with other teams on complex projects that require a complicated analysis.

References

PM Best Practices: Creating a culture of experimentation (even at a B2B company!) via Heap’s blog

Subscribe to get notified of the latest Scale talks.

About Milene Darnis:

Milene is a data engineer turned product manager who’s worked at some of the biggest names in the bay with Uber and Heap. Her passion for linking data to concrete business problems and modeling core datasets led her to build Uber’s Databook: a centralized platform to understand the company’s Data ecosystem.