Notes about Cutting-Edge Technologies and Everything

Posts

Showing posts from September, 2018

Azure HDInsight brings next generation Apache Hadoop 3.0

- September 26, 2018

Preview of Apache Hadoop 3.0 in Azure HDInsight 4.0 Led by Hortonworks, Apache Hadoop 3.0 represents over 5 years of work across the community since the last major update to the Hadoop stack. Enterprises can now realize their data lake vision while efficiently incorporating deep learning frameworks in to their applications all on the same Hadoop stack that they are comfortable with. Some of the key enhancements include: With ACID semantics enabled by default, Apache Hive 3.0 becomes more like a traditional database, making it easier for customers to build LOB applications on top of very large data sets. Apache Druid is an open source data store with indexing/caching capabilities on top of a column-oriented storage layout. With Apache Hive and Apache Druid (now available by default), customers can do near real time exploratory analytics on incoming data. With Tensorflow, available by default, and GPU support, Apache Hadoop 3.0 squarely targets the machine learning...

See post »

Collection of data governance resources

- September 13, 2018

Learning about data governance Use these introductory books, videos, and articles to understand the basics of data governance. Data Governance: What You Need to Know — Jon Bruner explains how a data governance program provides the intellectual and institutional grounding to address the data needs across an organization, anticipate new issues, and provide for development according to the company’s strategic plan. Data Governance — John Adler leads you through the maze of data governance issues facing companies today—security breaches, regulatory agencies, in-house turf battles over who controls the data, monetizing data, and more. The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives — John Mertic and Maryna Strelchuk detail the benefits of a vendor-neutral approach to data governance. Understanding the Chief Data Officer — Through interviews with current and former chief data officers (CDO), Julie Steele lo...

See post »

Dremio 2.1 is shipped with many new features!

- September 07, 2018

This is a major release that includes many new features, performance improvements, and hundreds of stability enhancements - see the highlights and more details below. • Elasticsearch 6. Dremio now supports the latest versions of Elasticsearch. Enjoy full SQL support, including JOINs, Window functions, and accelerated analytics through any BI tool, including Tableau and Power BI. We also added support for compressing Elasticsearch responses to minimize network traffic. • Approximate count distinct acceleration. Dremio now supports accelerating count distinct queries based on an approximation-based algorithm (HyperLogLog). This provides a faster and more memory efficient way of providing distinct counts and is especially useful in high cardinality scenarios with very large datasets. • Faster ORC performance. Data encoded in ORC is now significantly faster to access and more memory efficient for ORC managed in Hive sources. • Support for AWS GovClou...

See post »