Notes about Cutting-Edge Technologies and Everything

Posts

Showing posts with the label Hortonworks

The Forrester Wave™: Cloud Hadoop/Spark Platforms, Q1 2019

- February 16, 2019

Cloud Hadoop/Spark (HARK) platforms accelerate insights by automating the storage, processing, and accessing of big data. In our 25-criterion evaluation of HARK providers, we identified the 11 most significant ones — Amazon Web Services (AWS), Cloudera, Google, Hortonworks, Huawei, MapR, Microsoft, Oracle, Qubole, Rackspace, and SAP — and researched, analyzed, and scored them. This report shows how each provider measures up and helps enterprise architecture (EA) professionals select the right one for their needs. Note: Cloudera and Hortonworks completed their planned merger on January 3, 2019, and will continue as Cloudera. This Forrester Wave reflects our evaluation of each company's independent HARK platforms prior to the completion of the merger. Full report available here >>>

See post »

Real-Time Stock Processing With Apache NiFi and Apache Kafka

- January 02, 2019

Implementing Streaming Use Case From REST to Hive With Apache NiFi and Apache Kafka Part 1 With Apache Kafka 2.0 and Apache NiFi 1.8, there are many new features and abilities coming out. It's time to put them to the test. So to plan out what we are going to do, I have a high-level architecture diagram. We are going to ingest a number of sources including REST feeds, Social Feeds, Messages, Images, Documents, and Relational Data. We will ingest with NiFi and then filter, process, and segment it into Kafka topics. Kafka data will be in Apache Avro format with schemas specified in the Hortonworks Schema Registry. Spark and NiFi will do additional event processing along with machine learning and deep learning. This will be stored in Druid for real-time analytics and summaries. Hive, HDFS, and S3 will store the data for permanent storage. We will do dashboards with Superset and Spark SQL + Zeppelin. We will also push back cleaned and aggregated data to subscribers via Kafka ...

See post »

Deep Speech With Apache NiFi 1.8

- December 15, 2018

Tools: Python 3.6, PyAudio, TensorFlow, Deep Speech, Shell, Apache NiFi Why : Speech-to-Text Use Case: Voice control and recognition. Series : Holiday Use Case: Turn on Holiday Lights and Music on command. Cool Factor: Ever want to run a query on Live Ingested Voice Commands? Other Options: Voice Controlled with AIY Voice and NiFi We are using Python 3.6 to write some code around PyAudio, TensorFlow, and Deep Speech to capture audio, store it in a wave file, and then process it with Deep Speech to extract some text. This example is running in OSX without a GPU on Tensorflow v1.11. The Mozilla Github repo for their Deep Speech implementation has nice getting-started information that I used to integrate our flow with Apache NiFi. Apache NiFi Flow Read full article >>>

See post »

What’s new in Hortonworks DataFlow 3.3?

- December 11, 2018

With the upcoming HDP 3.1 release, we also bring about some exciting innovations to enhance our Kafka offering – New Hive Kafka Storage Handler (for SQL Analytics) – View Kafka topics as tables and execute SQL via Hive with full SQL Support for joins, windowing, aggregations, etc. New Druid Kafka Indexing Service (for OLAP Analytics) – View Kafka topics as cubes and perform OLAP style analytics on streaming events in Kafka using Druid. HDF 3.3 includes the following major innovations and enhancements: Core HDF Enhancements Support for Kafka 2.0, the latest Kafka release in the Apache community, with lots of enhancements into security, reliability and performance. Support for Kafka 2.0 NiFi processors NiFi Connection load balancing – This feature allows for bottleneck connections in the NiFi workflow to spread the queued-up flow files across the NiFi cluster and increase the processing speed and therefore lessen the effect of the bottleneck. MQTT performance improvements inc...

See post »

Forrester Wave Cloud Data Warehouse, Q4 2018

- November 04, 2018

Evaluated Vendors And Inclusion Criteria Forrester included 14 vendors in the assessment: Alibaba, AWS, Exasol, Google, Hortonworks, Huawei, IBM, MarkLogic, Micro Focus, Microsoft, Oracle, Pivotal, Snowflake, and Teradata. Each of these vendors has ( see Figure 1 ): A comprehensive CDW offering. Key components of the CDW include the provisioning, storing, processing, transforming, and accessing of data. The CDW should provide features to secure data, enable elastic scale, provide high availability and disaster recovery options, support loading and unloading of data, and provide various data access tools. A standalone data warehouse service running in the public cloud. Vendors included in this evaluation provide a CDW service that organizations can implement or use independent of analytics, data science, and visualization tools. The service should not be technologically tied to or bundled with any particular application or solution. Data warehouse use cases. The CDW service shoul...

See post »

Azure HDInsight brings next generation Apache Hadoop 3.0

- September 26, 2018

Preview of Apache Hadoop 3.0 in Azure HDInsight 4.0 Led by Hortonworks, Apache Hadoop 3.0 represents over 5 years of work across the community since the last major update to the Hadoop stack. Enterprises can now realize their data lake vision while efficiently incorporating deep learning frameworks in to their applications all on the same Hadoop stack that they are comfortable with. Some of the key enhancements include: With ACID semantics enabled by default, Apache Hive 3.0 becomes more like a traditional database, making it easier for customers to build LOB applications on top of very large data sets. Apache Druid is an open source data store with indexing/caching capabilities on top of a column-oriented storage layout. With Apache Hive and Apache Druid (now available by default), customers can do near real time exploratory analytics on incoming data. With Tensorflow, available by default, and GPU support, Apache Hadoop 3.0 squarely targets the machine learning...

See post »

The Forrester Wave Big Data Fabric, Q2 2018

- June 14, 2018

Key Takeaways Talend, Denodo Technologies, Oracle, IBM, And Paxata Lead The Pack Forrester's research uncovered a market in which Talend, Denodo Technologies, Oracle, IBM, and Paxata are Leaders; Hortonworks, Cambridge Semantics, SAP, Trifacta, Cloudera, and Syncsort are Strong Performers; and Podium Data, TIBCO Software, Informatica, and Hitachi Vantara are Contenders. EA Pros Are Looking To Support Multiple Use Cases With Big Data Fabric The big data fabric market is growing because more EA pros see big data fabric as critical for their enterprise big data strategy. Scale, Performance, AI/Machine Learning, And Use-Case Support Are Key Differentiators The Leaders we identified support a broader set of use cases, enhanced AI and machine learning capabilities, and offer good scalability features. ...

See post »