Posts

Showing posts from March, 2018

KSQL the new streaming SQL engine for Apache Kafka

The recently introduced  KSQL , the streaming SQL engine for Apache Kafka, substantially lowers the bar to entry for the world of stream processing. Instead of writing a lot of programming code, all you need to get started with stream processing is a simple SQL statement, such as: SELECT * FROM payments - kafka stream WHERE fraud_probability > 0.8 , That’s it! And while this might not be immediately obvious, the above streaming query of KSQL is distributed, scalable, elastic, and real time to meet the data needs of businesses today. Of course, you can do much more with KSQL than I have shown in the simple example above. KSQL is open source (Apache 2.0 licensed) and built on top of Kafka’s Streams API. This means it supports a wide range of powerful stream processing operations, including filtering, transformations, aggregations, joins, windowing, and sessionization. This way you can detect anomalies and fraudulent activities in real time, monitor infrastructure and IoT

Gartner Hype Cycle for Data Science and Machine Learning, 2017

Image
The hype around data science and machine learning has increased from already high levels in the past year. Data and analytics leaders should use this Hype Cycle to understand technologies generating excitement and inflated expectations, as well as significant movements in adoption and maturity. The Hype Cycle The Peak of Inflated Expectations is crowded and the Trough of Disillusionment remains sparse, though several highly hyped technologies are beginning to hear the first disillusioned rumblings from the market. In general, the faster a technology moves from the innovation trigger to the peak, the faster the technology moves into the trough as organizations quickly see it as just another passing fad. This Hype Cycle is especially relevant to data and analytics leaders, chief data officers, and heads of data science teams who are implementing machine-learning programs and looking to understand the next-generation innovations. Technology provider product marketers and strategists

Comparing production-grade NLP libraries

A comparison of the accuracy and performance of Spark-NLP vs. spaCy, and some use case recommendations:  https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-accuracy-performance-and-scalability A step-by-step guide to building and running a natural language processing pipeline:  https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-running-spark-nlp-and-spacy-pipelines A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy:  https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-training-spark-nlp-and-spacy-pipelines

27 Great Resources About Logistic Regression

27 Great Resources About Logistic Regression: Customer Churn – Logistic Regression with R Predicting Flights Delay Using Supervised Learning, Logistic Regression Logistic Regression vs Decision Trees vs SVM: Part II Logistic Regression Vs Decision Trees Vs SVM: Part I Making data science accessible – Logistic Regression Logistic Regression using python Logistic Regression and Maximum Entropy explained with examples Decision tree vs Logistic Regression Excluding variables from a logistic regression model based on correlation Regression, Logistic Regression and Maximum Entropy  + Oversampling/Undersampling in Logistic Regression Fraud Detection using logistic regression Explaining variability in logistic regression Handling Imbalanced data when building regression models Multiple logistic Regression Power Analysis Model Accuracy - In logistic Regression Outliers in Logistic Regression Logistic Regression - Hosmer Lemeshow test Logistic regression intercept term not s

Roundup of ML forecasts and market estimates 2018

The list of linked sources at the end of the article alone is well worth a look. Details: http://bit.ly/2F9kAYe Sources of Market Data on Machine Learning: 2018 Outlook: Machine Learning and Artificial Intelligence , A Survey of 1,600+ Data Professionals.   MEMSQL.   (14 pp., PDF, no opt-in) Advice for applying Machine Learning , Andrew Ng, Stanford University. (30 pp., PDF, no opt-in) An Executive’s Guide to Machine Learning , McKinsey Quarterly. June 2015 An Investors' Guide to Artificial Intelligence , J.P. Morgan. November 27, 2017 (110 pp., PDF, no opt-in) Artificial intelligence and machine learning in financial services Market developments and financial stability implications , Financial Stability Board. (45 pp., PDF, no opt-in) Big Data and AI Strategies Machine Learning and Alternative Data Approach to Investing , J.P. Morgan. (280 pp., PDF. No opt-in). Google & MIT Technology Review study: Machine Learning: The New Proving Ground for Competitive Advantage

Analytics maturity powers company performance

Image
The fact that we believe analytics drive performance isn’t enough. This report by David Alles (International Institute for Analytics) provides a range of supporting evidence, using IIA’s proprietary analytics maturity data – from 74 leading companies like Amazon, Apple, Netflix and Google – and publicly available financial and company data, to illustrate the positive association between analytics maturity and superior company performance. Details: http://bit.ly/2tba3u2