Posts

Showing posts with the label NLP

Decoding ‘Game of Thrones’ by way of data science

Image
With the final season of the television series ‘Game of Thrones’ upon us it is a good opportunity to take a closer look at the books that the series is based on. We will discover how a numerical processing of the books can help us reveal patterns that lie hidden in ‘A Song of Ice and Fire’. How does one begin to objectively measure a book? Isn’t it all about the subjective experience in the mind of the reader? Indeed, there are many ways of how literary critics have tried to capture and communicate the essence and measure of value of a book. A book, along with other forms of art, is often valued to the extent which it can give us new and nuanced insights into our own human experience. A fantasy novel series such as ‘A Song of Ice and Fire’ sets the story in a more boundless landscape allowing even more freedom to explore the hopes and fears that lies within us all. However, this article is not a literary critics review, but rather a data science exploration. This numerical explor...

Looking Back at Google’s Research Efforts in 2018

Image
2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with  Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year: Ethical Principles and AI AI for Social Good Assistive Technology Quantum computing Natural Language Understanding Perception Computational Photography Algorithms and Theory Software Systems AutoML Tensor Processing Units (TPUs)  Open Source Software and Datasets Robotics Applications of AI to Other Fields Read more >>>

The Best Free Datasets for Machine Learning

What are some open datasets for machine learning? We at Gengo decided to create the ultimate cheat sheet for high quality datasets. These range from the vast (looking at you, Kaggle) or the highly specific (data for self-driving cars). First, a couple of pointers to keep in mind when searching for datasets. According to Dataquest : A dataset shouldn’t be messy, because you don’t want to spend a lot of time cleaning data. A dataset shouldn’t have too many rows or columns, so it’s easy to work with. The cleaner the data, the better — cleaning a large data set can be very time consuming. There should be an interesting question that can be answered with the data. Let’s get to it! Dataset Finders Kaggle : A data science site that contains a variety of externally-contributed interesting datasets. You can find all kinds of niche datasets in its master list , from ramen ratings to basketball data to and even seattle pet licenses . UCI Machine Learning Repository : One of ...

Comparing production-grade NLP libraries

A comparison of the accuracy and performance of Spark-NLP vs. spaCy, and some use case recommendations:  https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-accuracy-performance-and-scalability A step-by-step guide to building and running a natural language processing pipeline:  https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-running-spark-nlp-and-spacy-pipelines A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy:  https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-training-spark-nlp-and-spacy-pipelines