Data Reliability at Scale: How Fox Digital Architected its Modern Data Stack
As distributed architectures continue to become a new gold standard for data driven organizations, this kind of self-serve motion would be a dream come true for many data leaders. So when the Monte Carlo team got the chance to sit down with Alex, we took a deep dive into how he made it happen.
Here’s how his team architected a hybrid data architecture that prioritizes democratization and access, while ensuring reliability and trust at every turn.
Exercise “Controlled Freedom” when dealing with stakeholders
Alex has built decentralized access to data at Fox on a foundation he calls “controlled freedom.” In fact, he believes using your data team as the single source of truth within an organization actually creates the biggest silo.
So instead of becoming a guardian and bottleneck, Alex and his data team focus on setting certain parameters around how data is ingested and supplied to stakeholders. Within the framework, internal data consumers at Fox have the freedom to create and use data products as needed to meet their business goals.
“If you think about a centralized data reporting structure, where you used to come in, open a ticket, and wait for your turn, by the time you get an answer, it’s often too late,” Alex said. “Businesses are evolving and growing at a pace I’ve never seen before, and decisions are being made at a blazing speed. You have to have data at your fingertips to make the correct decision.”
To accomplish this at scale, Alex and his centralized data team control a few key areas: how data is ingested, how data is kept secure, and how data is optimized in the best format to be then published to standard executive reports. When his team can ensure data sources are trustworthy, data is secure, and the company is using consistent metrics and definitions for high-level reporting, it gives data consumers the confidence to freely access and leverage data within that framework.
“Everything else, especially within data discovery and your ad-hoc analytics, should be free,” said Alex. “We give you the source of the data and guarantee it’s trustworthy. We know that we’re watching those pipelines multiple times every day, and we know that the data inside can be used for X, Y, and Z — so just go ahead and use it how you want. I believe this is the way forward: “striving towards giving people trust in the data platforms while supplying them with the tools and skill sets they need to be self-sufficient.”
Invest in a decentralized data team
Under Alex’s leadership, five teams oversee data for the Fox digital organization: data tagging and collections, data engineering, data analytics, data science, and data architecture. Each team has its own responsibilities, but everyone works together to solve problems for the entire business.
“I strongly believe in the fact that you have to engage the team in the decision-making process and have a collaborative approach,” said Alex. “We don’t have a single person leading architecture—it’s a team chapter approach. The power of the company is, in essence, the data. But people are the power of that data. People are what makes that data available.”
While members of different data teams collaborate to deliver value to the business, there’s a clear delineation between analysts and engineers within the Fox data organization. Analysts sit close to the business units, understanding pain points and working to find and validate new data sources. This knowledge informs what Alex and his teams call an STM, or Source to Target Mapping—a spec that essentially allows engineers to operate from a well-defined playbook to build the pipelines and architecture necessary to support the data needs of the business.
This division of labor between analysts and engineers “allows people to focus on their specific areas instead of being spread thin,” said Alex. “Some people may disagree with me, but quite frankly, having developers attend a lot of business meetings can be a waste of their time—because collecting and understanding business requirements often is a strenuous and time consuming effort. By installing the analytics before engineering gets involved, we can bridge that gap and then allow the developers to do what they do best – building the most reliable, resilient and optimized jobs .”
(It’s worth noting, however, that this decentralized approach won’t work for every organization, and the needs of your team structure will vary based on the SLAs your company sets for data. )
Avoid shiny new toys in favor of problem-solving tech
I’ve been in data for over a decade, and I can say under no uncertain terms that Fox has one of the most robust and elegant data tech stacks that I’ve ever seen. But Alex is adamant that data leaders shouldn’t pursue shiny new tech for its own sake.
“First and foremost, in order to be successful at delivering the right underlying architecture, you need to understand the business,” said Alex. “Don’t chase the latest and greatest technology, because then you’re never going to stop. And sometimes the stack you have right now is good enough—all you have to do is optimize it.”
The Fox data team built their tech stack to meet a specific need: enabling self-service analytics. “We embarked on the journey of adopting a lakehouse architecture because it would give us both the beauty and control of a data lake, as well as the cleanliness and structure of a data warehouse.”
Several types of data flow into the Fox digital ecosystem, including batched, micro-batched, streaming, structured, and unstructured. After ingestion, data goes through what Alex refers to as a “three-layer cake”.
“First, we have the data exposed at its raw state, exactly how we ingest it,” said Alex. “But that raw data is often not usable for people who want to do discovery and exploration. That’s why we’re building the optimized layer, where data gets sorted, sliced-and-diced, and optimized in different file formats for the speed of reading, writing, and usability. After that, when we know something needs to be defined as a data model or included in a data set, we engage in that within the publishing layer and then build it out for broader consumption within the company. Inside of the published layer, data can be exposed via our tool stack.”
The optimized layer makes up the pool of data that Alex and his team provide to internal stakeholders under the “controlled freedom” model. With self-serve analytics, data users can discover and work with data assets that they already know are trustworthy and secure.
“If you don’t approach your data from the angle that it’s easy to discover, easy to search, and easy to observe, it becomes more like a swamp,” said Alex. “We need to instill and enforce some formats and strict regulations to make sure the data is getting properly indexed and properly stored so that people can find and make sense of the data.”
To make analytics self-serve, invest in data trust
For this self-serve model to work, the organization needs to have trust that the data is accurate, reliable, and trustworthy. To help achieve this goal, the entire data stack is wrapped in QA, validation, and alerting. Fox uses Monte Carlo to provide end-to-end data observability, along with Datadog, Cloud Watch Alerts, and custom frameworks to help govern and secure data throughout its lifecycle.
Comments
Post a Comment