Posts

Showing posts from November, 2020

Gamification in Technology Adoption

Image
Adoption is the use of a new technology. Engagement is the amount of involvement with a technology. This small semantic difference is the key to unlocking the full potential of new applications. Some applications will have inherently higher engagement than others. For example, a frontline healthcare worker will have a high level of engagement with an electronic medical record because it contains essential information for treating patients. There are other technologies we introduce to make processes easier and faster, even if they are not required. For example, a data analyst may or may not choose to use a metadata management application to learn about the data they use every day. While using the application will make their work easier and faster, they can choose to do their work without it. Engagement is about utilization — increasing the likelihood that people will use the application. Continue reading >>>

Data Observability Ushers In A New Era Enabling Golden Age Of Data

Image
Have we entered the Golden Age of Data? Modern enterprises are collecting, producing, and processing more data than ever before. According to a February 2020 IDG survey of data professionals, average corporate data volumes are increasing by 63% per month. 10% of respondents even reported that their data volumes double every month. Large companies are investing heavily to transform themselves into data-driven organizations that can quickly adapt to the fast pace of a modern economy. They gather huge amounts of data from customers and generate reams of data from transactions. They continuously process data in an attempt to personalize customer experiences, optimize business processes, and drive strategic decisions. The Real Challenge with Data In theory, breakthrough open-source technologies, such as Spark, Kafka, and Druid are supposed to help just about any organization benefit from massive amounts of customer and operational data just like they benefit Facebook, Apple, Google, Microso...

Gartner - 2020 Magic Quadrant for Metadata Management Solutions

Image
Metadata management is a core aspect of an organization’s ability to manage its data and information assets. The term “metadata” describes the various facets of an information asset that can improve its usability throughout its life cycle. Metadata and its uses go far beyond technical matters. Metadata is used as a reference for business-oriented and technical projects, and lays the foundations for describing, inventorying and understanding data for multiple use cases. Use-case examples include data governance, security and risk, data analysis and data value. The market for metadata management solutions is complex because these solutions are not all identical in scope or capability. Vendors include companies with one or more of the following functional capabilities in their stand-alone metadata management products (not all vendors offer all these capabilities, and not all vendor solutions offer these capabilities in one product): Metadata repositories — Used to document and manage meta...

Nemo: Data discovery at Facebook

Image
Large-scale companies serve millions or even billions of people who depend on the services these companies provide for their everyday needs. To keep these services running and delivering meaningful experiences, the teams behind them need to find the most relevant and accurate information quickly so that they can make informed decisions and take action. Finding the right information can be hard for several reasons. The problem might be discovery — the relevant table might have an obscure or nondescript name, or different teams might have constructed overlapping data sets. Or, the problem could be one of confidence — the dashboard someone is looking at might have been superseded by another source six months ago.  Many companies, such as Airbnb, Lyft, Netflix, and Uber, have built their own custom solutions for this challenge. For us, it was important to make the data discovery process simple and fast. Funneling everything through data experts to locate the necessary data each time we...

10 Reasons to Choose Apache Pulsar Over Apache Kafka

Image
Apache Pulsar's unique features such as tiered storage, stateless brokers, geo-aware replication, and multi-tenancy may be a reason to choose it over Apache Kafka. Today, many data architects, engineers, dev-ops, and business leaders are struggling to understand the pros and cons of Apache Pulsar and Apache Kafka. As someone who has worked with Kafka in the past, I wanted to compare these two technologies.  If you are looking for insights on when to use Pulsar, here are 10 advantages of the technology that might be the deciding factors for you. Continue reading >>>

The State of Open-Source Data Integration and ETL

Image
Open-source data integration started 16 years ago with Talend. Since then, the whole industry has changed. Let's compare the different actors. Open-source data integration is not new. It started 16 years ago with Talend. But since then, the whole industry has changed. The likes of Snowflake, Bigquery, Redshift have changed how data is being hosted, managed, and accessed while making it easier and a lot cheaper. But the data integration industry has evolved as well. On one hand, new open-source projects emerged, such as Singer.io in 2017. This enabled more data integration connectors to become accessible to more teams, even though it still required a significant amount of manual work.  On the other hand, data integration was made accessible to more teams (analysts, scientists, business intelligence teams). Indeed, companies like Fivetran benefited from Snowflake’s rise,  empowering non-engineering teams to set up and manage their data integration connectors by themselves, so th...