Notes about Cutting-Edge Technologies and Everything

Posts

Showing posts with the label EDW

Cloud Data Warehouse Comparison: Redshift vs. BigQuery vs. Azure vs. Snowflake for Real-Time Workloads

- November 22, 2021

Data helps companies take the guesswork out of decision-making. Teams can use data-driven evidence to decide which products to build, which features to add, and which growth initiatives to pursue. And, such insights-driven businesses grow at an annual rate of over 30%. But, there’s a difference between being merely data-aware and insights-driven. Discovering insights requires finding a way to analyze data in near real-time, which is where cloud data warehouses play a vital role. As scalable repositories of data, warehouses allow businesses to find insights by storing and analyzing huge amounts of structured and semi-structured data. And, running a data warehouse is more than a technical initiative. It’s vital to the overall business strategy and can inform an array of future product, marketing, and engineering decisions. But, choosing a cloud data warehouse provider can be challenging. Users have to evaluate costs, performance, the ability to handle real-time workloads, and other...

See post »

Top 9 Data Modeling Tools & Software 2021

- October 06, 2021

Data modeling is the procedure of crafting a visual representation of an entire information system or portions of it in order to convey connections between data points and structures. The objective is to portray the types of data used and stored within the system, the ways the data can be organized and grouped, the relationships among these data types, and their attributes and formats. Data modeling uses abstraction to better understand and represent the nature of the flow of data within an enterprise-level information system. The types of data models include: Conceptual data models. Logical data models. Physical data models. Database and information system design begins with the creation of these data models. What is a Data Modeling Tool? A data modeling tool enables quick and efficient database design while minimizing human error. A data modeling software helps craft a high-performance database, generate reports that can be useful for stakeholders and create data de...

See post »

Snowflake Data Sharing and Data Marketplace

- April 01, 2021

Snowflake data sharing and data marketplace can support modern data sharing techniques and eliminate the need for data movement. In Snowflake, there is no need to extract the data from the provider database and use some secure data transfer mechanism to share it with the consumers. Snowflake supports data sharing embedded into their SQL language so databases can be shared from within SQL commands. And on top of that, the data provider can update the data in real-time ensuring that all consumers will have a consistent, up-to-date view of their data sets. How Data Sharing Works Snowflake can share regular and external tables, and secure views and secure materialized views. Snowflake enables the sharing of databases through the concept of shares. Continue reading >>>

See post »

Data Processing Pipeline Patterns

- October 14, 2019

Data produced by applications, devices, or humans must be processed before it is consumed. By definition, a data pipeline represents the flow of data between two or more systems. It is a set of instructions that determine how and when to move data between these systems. My last blog conveyed how connectivity is foundational to a data platform. In this blog, I will describe the different data processing pipelines that leverage different capabilities of the data platform, such as connectivity and data engines for processing. There are many data processing pipelines. One may: “Integrate” data from multiple sources Perform data quality checks or standardize data Apply data security-related transformations, which include masking, anonymizing, or encryption Match, merge, master, and do entity resolution Share data with partners and customers in the required format, such as HL7 Consumers or “targets” of data pipelines may include: Data warehouses like ...

See post »

The Future of Data Engineering

- August 24, 2019

Data engineering’s job is to help an organization move and process data. This generally requires two different systems, broadly speaking: a data pipeline, and a data warehouse. The data pipeline is responsible for moving the data, and the data warehouse is responsible for processing it. I acknowledge that this is a bit overly simplistic. You can do processing in the pipeline itself by doing transformations between extraction and loading with batch and stream processing. The “data warehouse” now includes many storage and processing systems (Flink, Spark, Presto, Hive, BigQuery, Redshift, etc), as well as auxiliary systems such as data catalogs, job schedulers, and so on. Still, I believe the paradigm holds. The industry is working through changes in how these systems are built and managed. There are four areas, in particular, where I expect to see shifts over the next few years. Timeliness: From batch to realtime Connectivity: From one:one bespoke integrations to many:many Cen...

See post »

Tuning Snowflake Performance Using the Query Cache

- May 05, 2019

In terms of performance tuning in Snowflake, there are very few options available. However, it is worth understanding how the Snowflake architecture includes various levels of caching to help speed your queries. This article provides an overview of the techniques used, and some best practice tips on how to maximise system performance using caching. Snowflake Database Architecture Before starting it’s worth considering the underlying Snowflake architecture, and explaining when Snowflake caches data. The diagram below illustrates the overall architecture which consists of three layers:- Service Layer: Which accepts SQL requests from users, coordinates queries, managing transactions and results. Logically, this can be assumed to hold the result cache – a cached copy of the results of every query executed. Compute Layer: Which actually does the heavy lifting. ...

See post »

Enterprise data integration with an operational data hub

- February 17, 2018

Big data (also called NoSQL) technologies facilitate the ingestion, processing, and search of data with no regard to schema (database structure). Web technologies such as Google, LinkedIn, and Facebook use big data technologies to process the tremendous amount of data from every possible source without regard to structure, and offer a searchable interface to access it. Modern NoSQL technologies have evolved to offer capabilities to govern, process, secure, and deliver data, and have facilitated the development of an integration pattern called the operational data hub (ODH). The Centers for Medicare and Medicaid Services (CMS) and other organizations (public and private) in the health, finance, banking, entertainment, insurance, and defense sectors (amongst others) utilize the capabilities of ODH technologies for enterprise data integration. This gives them the ability to access, integrate, master, process, and deliver data across the enterprise. Traditional mode...

See post »