Posts

Showing posts with the label AWS

2022 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms

Image
  Today’s analytics and BI platforms are augmented throughout and enable users to compose low/no-code workflows and applications. Cloud ecosystems and alignment with digital workplace tools are key selection factors. This research helps data and analytics leaders plan for and select these platforms. Analytics and business intelligence (ABI) platforms enable less technical users, including businesspeople, to model, analyze, explore, share and manage data, and collaborate and share findings, enabled by IT and augmented by artificial intelligence (AI). ABI platforms may optionally include the ability to create, modify or enrich a semantic model including business rules. Today’s ABI platforms have an emphasis on visual self-service for end users, augmented by AI to deliver automated insights. Increasingly, the focus of augmentation is shifting from the analyst persona to the consumer or decision maker. To achieve this, automated insights must not only be statistically relevant, but the...

Data Reliability at Scale: How Fox Digital Architected its Modern Data Stack

Image
  As distributed architectures continue to become a new gold standard for data driven organizations, this kind of self-serve motion would be a dream come true for many data leaders. So when the Monte Carlo team got the chance to sit down with Alex, we took a deep dive into how he made it happen.  Here’s how his team architected a hybrid data architecture that prioritizes democratization and access, while ensuring reliability and trust at every turn. Exercise “Controlled Freedom” when dealing with stakeholders Alex has built decentralized access to data at Fox on a foundation he calls “controlled freedom.” In fact, he believes using your data team as the single source of truth within an organization actually creates the biggest silo.  So instead of becoming a guardian and bottleneck, Alex and his data team focus on setting certain parameters around how data is ingested and supplied to stakeholders. Within the framework, internal data consumers at Fox have the freedom to cr...

Dec 2021 Gartner Magic Quadrant for Cloud Database Management Systems

Image
  Database management systems continue their move to the cloud — a move that is producing an increasingly complex landscape of vendors and offerings. This Magic Quadrant will help data and analytics leaders make the right choices in a complex and fast-evolving market. Strategic Planning Assumptions By 2025, cloud preference for data management will substantially reduce the vendor landscape while the growth in multicloud will increase the complexity for data governance and integration. By 2022, cloud database management system (DBMS) revenue will account for 50% of the total DBMS market revenue. These DBMSs reflect optimization strategies designed to support transactions and/or analytical processing for one or more of the following use cases:     Traditional and augmented transaction processing     Traditional and logical data warehouse     Data science exploration/deep learning     Stream/event processing   ...

AWS vs Azure vs GCP: Cloud Web Services Comparison in Detail

Image
  The following post focuses on AWS, MS Azure, and GCP in detail. Learn more about each cloud service and how to choose the best one for your business needs.  Digitalization is being embraced by all of us across the globe, especially cloud computing technology. Whether it's because of its scalability or security or reduced costs, cloud platforms have sprung up to a great extent over a few years. Gone are the days when businesses were confused about whether to choose a cloud service provider or not. Now the confusion surrounds the question of which cloud service provider to use. AWS, Azure, and Google Cloud are our top three contenders. Recently, I happen to stumble upon an informative post focusing on AWS Lambda vs Azure Functions. I must say this one was quite detailed and well-structured. Here they have successfully covered all the aspects that are essential and dominating while we compare lambda vs azure. And I am pretty sure considering both the posts together will act a...

Cloud Data Warehouse Comparison: Redshift vs. BigQuery vs. Azure vs. Snowflake for Real-Time Workloads

Image
  Data helps companies take the guesswork out of decision-making. Teams can use data-driven evidence to decide which products to build, which features to add, and which growth initiatives to pursue. And, such insights-driven businesses grow at an annual rate of over 30%. But, there’s a difference between being merely data-aware and insights-driven. Discovering insights requires finding a way to analyze data in near real-time, which is where cloud data warehouses play a vital role. As scalable repositories of data, warehouses allow businesses to find insights by storing and analyzing huge amounts of structured and semi-structured data. And, running a data warehouse is more than a technical initiative. It’s vital to the overall business strategy and can inform an array of future product, marketing, and engineering decisions. But, choosing a cloud data warehouse provider can be challenging. Users have to evaluate costs, performance, the ability to handle real-time workloads, and other...

The State of serverless computing 2021

Image
Serverless computing is redefining the way organizations develop, deploy, and integrate cloud-native applications. According to an industry report, market size of serverless computing is expected to reach 7.72 billion by 2021. A new and compelling paradigm for the deployment of cloud applications, serverless computing is at the precipice of enterprise shift towards containers and microservices. In the year 2021, serverless paradigm shift presents exciting opportunities to organizations by providing a simplified programming model for creating cloud applications by abstracting away most operational concerns. Major cloud vendors, Microsoft, Google, and Amazon are already in the game with their respective offering and there is no reason you shouldn’t aboard the train. 2021 is the year of FaaS All major providers of serverless computing offer several types and tiers of database and storage services to their customers. In addition, all major cloud player such as Amazon, Microsoft and Google ...
Image
 O ver the past few years, companies have been massively shifting their data and applications to the cloud that ended up raising a community of data users. They are encouraged to capture, gather, analyze, and save data for business insights and decision-making. More organizations are leading towards the use of multi-cloud, and the threat of losing data and securing has become challenging. Therefore, managing security policies, rules, metadata details, content traits is becoming critical for the multi-cloud. In this regard, the enterprises are in search of expertise and cloud tool vendors that are capable of providing the fundamental cloud security data governance competencies with excellence. Start with building policies and write them into code, or scripts that can be executed. This requires compliance and cloud security experts working together to build a framework for your complex business. You cannot start from scratch as it will be error-prone and will take too long. Try to in...

14 ways AWS beats Microsoft Azure and Google Cloud

Image
Microsoft Azure and Google Cloud have their advantages, but they don’t match the breadth and depth of the Amazon cloud. The reason is simple: AWS has built out so many products and services that it’s impossible to begin to discuss them in a single article or even a book. Many of them were amazing innovations when they first appeared and the hits keep coming. Every year Amazon adds new tools that make it harder and harder to justify keeping those old boxes pumping out heat and overstressing the air conditioner in the server room down the hall. For all of its dominance, though, Amazon has strong competitors. Companies like Microsoft, Google, IBM, Oracle, SAP, Rackspace, Linnode, and Digital Ocean know that they must establish a real presence in the cloud and they are finding clever ways to compete and excel in what is less and less a commodity business. These rivals offer great products with different and sometimes better approaches. In many cases, they’re running neck and neck wi...

The Forrester Wave™: Data Management For Analytics, Q1 2020

Image
While traditional data warehouses often took years to build, deploy, and reap benefits from, today's organizations want simple, agile, integrated, cost-effective, and highly automated solutions to support insights. In addition, traditional architectures are failing to meet new business requirements, especially around high-speed data streaming, real-time analytics, large volumes of messy and complex data sets, and self-service. As a result, firms are revisiting their data architectures, looking for ways to modernize to support new requirements. DMA is a modern architecture that minimizes the complexity of messy data and hides heterogeneity by embodying a trusted model and integrated policies and by adapting to changing business requirements. It leverages metadata, in-memory, and distributed data repositories, running on-premises or in the cloud, to deliver scalable and integrated analytics. Adoption of DMA will grow further as enterprise architects look at overcoming data challeng...

Modern applications at AWS

Image
Innovation has always been part of the Amazon DNA, but about 20 years ago, we went through a radical transformation with the goal of making our iterative process—"invent, launch, reinvent, relaunch, start over, rinse, repeat, again and again"—even faster. The changes we made affected both how we built applications and how we organized our company. Back then, we had only a small fraction of the number of customers that Amazon serves today. Still, we knew that if we wanted to expand the products and services we offered, we had to change the way we approached application architecture. The giant, monolithic "bookstore" application and giant database that we used to power Amazon.com limited our speed and agility. Whenever we wanted to add a new feature or product for our customers, like video streaming, we had to edit and rewrite vast amounts of code on an application that we'd designed specifically for our first product—the bookstore. This was a long, unwieldy p...

2019 Datanami Readers’ and Editors’ Choice Awards

Image
Datanami  is pleased to announce the results of its fourth annual Readers’ and Editors’ Choice Awards, which recognizes the companies, products, and projects that have made a difference in the big data community this year. These awards, which are nominated and voted on by Datanami readers, give us insight into the state of the community. We’d like to thank our dedicated readers for weighing in on their top picks for the best in big data. It’s been a privilege for us to present these awards, and we extend our congratulations to this year’s winners. Best Big Data Product or Technology: Machine Learning Readers’ Choice: Elastic Editor’s Choice: SAS Visual Data Mining & Machine Learning Best Big Data Product or Technology: Internet of Things Readers’ Choice: SAS Analytics for IoT Editor’s Choice:  The Striim Platform Best Big Data Product or Technology: Big Data Security Readers’ Choice: Cloudera Enterprise Editor’s Choice: Elastic Stack Best Big ...

The Forrester Wave™: Streaming Analytics, Q3 2019

Image
Key Takeaways Software AG, IBM, Microsoft, Google, And TIBCO Software Lead The Pack Forrester's research uncovered a market in which Software AG, IBM, Microsoft, Google, and TIBCO Software are Leaders; Cloudera, SAS, Amazon Web Services, and Impetus are Strong Performers; and EsperTech and Alibaba are Contenders. Analytics Prowess, Scalability, And Deployment Freedom Are Key Differentiators Depth and breadth of analytics types on streaming data are critical. But that is all for naught if streaming analytics vendors cannot also scale to handle potentially huge volumes of streaming data. Also, it's critical that streaming analytics can be deployed where it is most needed, such as on-premises, in the cloud, and/or at the edge. Read report >>>

Dremio 4.0 Data Lake Engine

Image
Dremio’s Data Lake Engine delivers lightning fast query speed and a self-service semantic layer operating directly against your data lake storage. No moving data to proprietary data warehouses or creating cubes, aggregation tables and BI extracts. Just flexibility and control for Data Architects, and self-service for Data Consumers. This release, also known as Dremio 4.0, dramatically accelerates query performance on S3 and ADLS, and provides deeper integration with the security services of AWS and Azure. In addition, this release simplifies the ability to query data across a broader range of data sources, including multiple lakes (with different Hive versions) and through community-developed connectors offered in Dremio Hub. Read full article >>>

Using AWK and R to parse 25tb

Image
Intro Recently I was put in charge of setting up a workflow for dealing with a large amount of raw DNA sequencing (well technically a SNP chip) data for my lab. The goal was to be able to quickly get data for a given genetic location (called a SNP) for use for modeling etc. Using vanilla R and AWK I was able to cleanup and organize the data in a natural way, massively speeding up the querying. It certainly wasn’t easy and it took lots of iterations. This post is meant to help others avoid some of the same mistakes and show what did eventually work. The Data The data was delivered to us by our university’s genetics processing center as 25 TB of tsvs. Before handing it off to me, my advisor split and gzipped these files into five batches each composed of roughly 240 four gigabyte files. Each row contained a data for a single SNP for a single person. There were ~2.5 million SNPS and ~60 thousand people Along with the SNP value there were multiple numeric columns on things like intensity o...
Image
Here’s a curated list of resources for data engineers, with sections for algorithms and data structures, SQL, databases, programming, tools, distributed systems, and more. Useful articles The AI Hierarchy of Needs The Rise of Data Engineer The Downfall of the Data Engineer A Beginner’s Guide to Data Engineering Part I Part II Part III Functional Data Engineering — a modern paradigm for batch data processing How to become a Data Engineer (in Russian) Talks Data Engineering Principles - Build frameworks not pipelines by Gatis Seja Functional Data Engineering - A Set of Best Practices by Maxime Beauchemin Advanced Data Engineering Patterns with Apache Airflow by Maxime Beauchemin Creating a Data Engineering Culture by Jesse Anderson Algorithms & Data Structures Algorithmic Toolbox in Russian Data Structures in Russian Data Structures & Algorithms Specialization on Coursera Algorithms Specialization from Stanford on Coursera SQL Com...