My main area of focus is natural language processing (NLP). I studied a Masters in 2008 at Cambridge University in Computer Speech, Text and Internet Technology and since then I have been working exclusively in machine learning and mostly in NLP. In recent years I have moved into freelance data science consultancy, focusing on NLP.
I have built NLP pipelines from scratch, and worked on natural language dialogue systems, document classifiers and text based recommender systems. For these tasks I have used both traditional machine learning techniques as well as the state of the art such as neural networks.
Natural Language Processing technologies that I use
I have worked on a variety of NLP models, including
- Bag of words, tf*idf, cosine similarity
- NLP pipelines, lemmatisation, parsers, chunkers
- Deep neural networks
- convolutional neural networks (text as well as images)
- RNN, LSTM
- Seq2seq, word2vec, doc2vec
- see a live demo of a CNN for author identification
- Clustering: Latent Dirichlet Allocation
- This is useful for extracting topics from a set of unstructured documents, for example legal documents, survey responses, factory error reports, etc.
- Search engines and search term recommenders
I work with the following programs
- Python NLTK
Examples of past Natural Language Processing projects
NLP projects I have worked on for major household names include
- a spoken dialogue system to control a smart home
- an unsupervised text analysis program to analyse text descriptions of manufacturing defects
- a model to classify jobseekers’ CVs into industries and salary bands.
- analysis of survey responses