The Urban Dictionary uses the question "if a tree falls in a forest and no one is around to hear it, does it make a sound?" as a metaphor for the idea that thoughts don't have much impact if they are not communicated to others1. The quote suggests that just like an unheard tree falling alone in the woods, thoughts that we keep to ourselves and are never voiced or shared, might as well not exist since they do not influence anything or anyone. The gist of it is about how expressing ideas is necessary for them to really matter.
This metaphor is also applicable to the limited effectiveness of underutilized data products. Where investments in their potential continue, even in the face of sometimes glaringly obvious headwinds2. This is with good reason. As leaders in organizations, we seek to acquire new insights into markets, mitigate risks, reduce inefficiency, optimize campaigns, increase the effectiveness of the delivery of services, and minimize customer churn amongst a million other objectives. Great data products depend on an array of systems, practices, leadership, and skill sets in the analytics culture3.
Making Data Products that Hit the Mark
Here at Mesh Digital LLC we believe that data and analytics products should be both pervasive and best delivered at the point of consumption. In other words, analytics and insights are embedded and available at the right times, at the places that consumers (e.g., customers, colleagues, or other stakeholders) use their everyday digital services to do their work. This minimizes friction and maximizes the reach of information to add value. To do this well demands a structured process that creates assurances that you’re building the right product for the right groups of individuals4.
We start by going straight to the source. The customer, colleague or stakeholder (begrudgingly 😢 “users''). We want to directly understand their wants, needs, fears, and pain points around how, what, where, and when they want to see and use data, insights, and analytics. It’s terribly important to not use proxies for these folks, which can be biased and ultimately not on the same page as the actual consumers. This research can and should be customized and calibrated for the user population and products or services at hand, making it unique for each research program (and Mesh’s clients).
So we develop qualitative data collection as a primary means to start the analysis. Doing so through; interviews, surveys, and questionnaires directed to individuals and focus groups. Designed to gather their thoughts on the current state of information delivery, the pain points, and areas for improvement (a.k.a., moments that matter). It’s important to involve a diverse set of individuals from across your consumer base, and segmenting across both psychographics and demographics to find them. Generational cohorts and even geolocations can be significant factors that influence expectations on interactions and information consumption preferences. Through this up-front research, it may very well be the case that a deployment of insights direct to consumer-facing applications is far less appropriate or useful than surfacing them to front-line team colleagues in your servicing or go-to-market operations who help individuals explore options and make quality decisions. The goal is to form a solid foundation of the needs and expectations of information of customers.
Competitive Intelligence & a Functional Perspective
Building on consumer expectations, the next set of learnings focuses on how these or similar requirements to deliver insights have been met across relevant industries. Competitive intelligence gathering can be active or passive. Additionally, we may want to tap into the tacit knowledge of our go-to-market team members in sales, servicing, or account management who are confronted with the pros and cons of competitor capabilities through their routine interactions with prospects and customers. Conferences are another great source of ideas and industry lessons learned. Additionally, here at Mesh we can also tap into professional networks of experts and research firms for industry leader interviews and surveys to get additional context on successes and failures in the efforts to attain similar results.
Understanding the current market state for data technologies suited to the consumers needs is critical. While obtaining where technology roadmaps are headed for evolving products may reveal opportunities to adopt new methods to deliver those data products. Equipped with all of this qualitative and quantitative data, making more fully informed data product decisions.
Current State Assessment
Based on our intelligence gathering, we have been equipped with the necessary information to synthesize and understand what the consumers expect, what has worked functionally in the industry, and how the proposed capabilities are strategically fit.
To do so, we must first have a comprehensive picture of the organization and its operations and identify the elements that are likely to own or contribute to the project objectives. Taking note of existing processes, tooling, skill sets and structures, and measuring this with respect to that which is needed to achieve the desired outcomes, we can decompose gaps and form hypotheses (we love scientific method here) as to the changes to propose. As such, we strive to gain perspective and consensus across the stakeholders on the assessment, identifying gaps, and aligning the new capabilities to our business strategy. Continuously measuring real time KPIs that will indicate the product, services, or program’s effectiveness.
Data Source and Content Consumption Discovery
The enterprise data which at least initially fuels the analytics pipeline is the target of our initial exploration alongside the current consumer-facing data visualization tools that trace back to it. As an extension of our internal tool research, we take note of the various tools in which information is presented from a communications point of view and the information that drives them. Oftentimes, there is no single analytical tool that satisfies the information needs of all consumers, and hence a tendency to have more than one.
An overview of the analytics culture helps us learn about the topmost important metrics and how they are presented, which can be useful for UX design as it relates to our particular set of well-defined use cases.It is worth mentioning here that a holistic data governance program is not the focus of this article, but discovery tends to generate some breadth of understanding of the data environment that is important for the immediate analytics task, if only to note context and potential opportunities for clarification. Import of this discovery to more holistic data governance efforts is a topic for another day.
With clearly defined objectives and requirements, we focus on the coordination of technology and technical teams. The degree to which coordination between objectives can be achieved depends greatly on the integration of large or even massive amounts of information, transformation into features, verification of models and algorithms based on objectives, validation of inferences in real-world scenarios and finally the deployment of prescriptive analysis outputs for consumption. These are the core internals of the Data Science practice which has at its edges interfaces with Data Engineering and Product Engineering. Some of what we have found to be successful at these points of intersection are described below.
Enhancing Data Science and Data Engineering Collaboration:
Cultivating a robust relationship between data science and data engineering is foundational5. Reliable flow of information into the analytics platform for feature engineering and publishing inferential data - the insights- back to enterprise systems depends on this. The disciplines have evolved into distinct functions over time, but there are still notable overlaps in function. Points of collaboration can vary and depend much on the composition of the teams. Often, collaboration starts with co-designing the optimal storage or schema configurations to most naturally avail the data for front-end data science activities - a key interface point between the two teams. Data governance activities also fall into this category.
On a more advanced level, we have seen partnership in data acquisitions create numerous efficiencies between the teams. Working with data suppliers to identify analytical utility, integration strategies, and collection tempo from the logical/information-centric perspective is an analytics-heavy discovery activity. Data Science can work with Data Privacy, InfoSec, and Legal to help capture relevant information-centric metrics, understand terms of service, provide expert determination, and document the delivery details and craft a well-informed hand-off to Data Engineering that significantly eliminates their overhead to integrate new information into the enterprise architecture6.
In large organizations, where the Data Engineering backlog is dominated with requirements across multiple systems and functions and carving out bandwidth for new capabilities may be compromised, a fully secured and compliant laboratory-like environment can bridge the gap in bringing capabilities to the factory ready for a more seamless deployment. In some cases where it makes sense to push some feature engineering upstream, or where the transformations from source to enterprise data architectures are very much bespoke, we have seen data scientists contribute to the data engineering codebase. Doing so requires adherence to data engineering coding, testing, and support practices.
Upskilling in DevSecOps and Agile Processes In Data Science Teams:
We recognize that, to the rest of the organization and in particular the product engineering teams who will directly utilize them, that models and algorithms, however complex or novel they may be in the inferences they produce will be packaged and versioned as software services and integrated into the business contexts and products where they matter. Data Science leads that are able to operate with this level of abstraction while recognizing the research-oriented nature of the data science tradecraft and mediating this with standard software development practices are able to more effectively communicate and collaborate throughout the product engineering development processes7. We have seen up-skilling data science teams in Agile processes, DevSecOps, and product ownership be an effective gap closer in this area regardless of whether these skill sets and capabilities are internal to the Data Science team or acquired through partnerships.
Intelligent Usage of Tooling for Data Science Activities
Most data science platforms have tools that enable Data Science practitioners to execute more seamlessly across the set of critical activities and there are emerging tools on the market as well. All models are experiments and must be managed throughout their life cycles. This includes curating and tracking training and reference datasets, as well as model artifacts, in source control. Model Management and Data Versioning tools are essential for this8, 9. To increase transparency and manage expectations, model cards should be published, which describe algorithms in terms of their expected or ideal input ranges and behaviors10.
Machine learning models have a limited lifespan, determined by data volatility and deployment environment. Therefore, it is important to know when to retrain and remove them from production in favor of improved versions. Drift Monitoring in input data or model performance, particularly for models with easily observed outputs such as conversions or click-through rates, is crucial for determining when to retrain.11, 12
Algorithms that work on unstructured data must be evaluated in more qualitative ways, by subject matter experts or reviewers. Data-centric machine learning is an emerging paradigm most effective for these applications and involves improving precision and accuracy through data selection, labeling, and augmentation13. Indexing content assets for search greatly improves efficiency for these people-centric processes.
Integrating Data Visualization
In conclusion, the success of data products depends on a variety of factors, from the analytics culture of the organization to the strategy for product delivery. To ensure that data products are successful, organizations must ensure that the right product is being built for the right group of individuals and that it is delivered to the right place and at the right time. Embracing this comprehensive approach maximizes value, minimizes friction, and propels organizations to maximize the benefits of their data products and ensure that they return the most value possible.
- Diggity Monkeez (screen name). (2024 JAN 04). "If a tree falls in the forest...". Urban Dictionary. https://www.urbandictionary.com/define.php?term=If+a+Tree+Falls+in+the+Forest... .Accessed Oct 23, 2023
- Bean, A. , (2023 Jan 01). "Has Progress on Data Analytics and AI Stalled at Your Company?", Harvard Business Review, https://hbr.org/2023/01/has-progress-on-data-analytics-and-ai-stalled-at-your-company. Accessed 2023 Oct 03.
- Mohan, S. , (2022 Sep 21). "What Is A Data Product And What Are The Key Characteristics?", , https://www.forbes.com/sites/forbesbusinesscouncil/2022/09/21/what-is-a-data-product-and-what-are-the-key-characteristics/?sh=203e8e9462c5. Accessed 2024 Jan 20.
- Sundaram, Mayur P. Joshi, Ning Su, Robert D. Austin, and Anand K. “Why So Many Data Science Projects Fail to Deliver.” MIT Sloan Management Review, https://sloanreview.mit.edu/article/why-so-many-data-science-projects-fail-to-deliver/. Accessed Oct 23, 2023
- (2023 Oct 23). "Data Engineer vs Data Scientist", New Scientist, https://jobs.newscientist.com/article/data-engineer-vs-data-scientist#. Accessed 2023 Dec 05.
- (2022 Oct 25). "Guidance Regarding Methods for De-identification of PHI...", Office of Civil Rights, US Department of Health and Human Services, https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#guidancedetermination. Accessed 2023 Oct 22.
- Saltz, J. , (2022 Jan 04). "Achieving Lean Data Science Agility Via Data Driven Scrum", International Conference on System Sciences , https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/21914b71-005c-4bcd-b8ff-876b797b8fcf/content. Accessed 2023 Oct 22.
- (2023 Nov 05). "MLflow: A Tool for Managing the Machine Learning Lifecycle", MLFlow.org, https://mlflow.org/docs/latest/index.html. Accessed 2024 Jan 25.
- (2024 Jan 09). "Get Started: Data Versioning", DVC.org, https://dvc.org/doc/start/data-management/data-versioning. Accessed 2023 Nov 30.
- (2024 Jan 01). "The value of a shared understanding of AI models", withgoogle.com, https://modelcards.withgoogle.com/about. Accessed 2023 Dec 05.
- Rafaat , Y. , (2023 May 08). "Managing Model Drift in Production with MLOps", Managing Model Drift , https://www.kdnuggets.com/2023/05/managing-model-drift-production-mlops.html D. Accessed 2024 Jan 20.
- (2023 Jan 23). "ML Observability Advanced Metrics", Arize.ai, https://arize.com/advanced-metrics-course/. Accessed 2024 Jan 25.45316.
- Miller, K. , (2022 Jan 25). "Data-Centric AI: AI Models Are Only as Good as Their Data Pipeline", Stanford University, Human Centered Artificial Intelligence, https://hai.stanford.edu/news/data-centric-ai-ai-models-are-only-good-their-data-pipeline. Accessed 2023 Dec 14.
- Interaction Design Foundation - IxDF. “What is Progressive Disclosure?” Interaction Design Foundation - IxDF. 29 Jan. 2024 https://www.interaction-design.org/literature/topics/progressive-disclosure Accessed 2024 Jan 15.
- (2024 Jan 20). "The best data visualizations are built with code", Observable HQ, https://observablehq.com/. Accessed 2024 Jan 23.