Responsibilities In-depth data analysis: Extract data to manipulate/calculate/format/combine intopresentable reports, charts, and graphs. Analyze and interpret data to find outliers, understand root cause, business impact, correlations/discrepancies, and proposechanges/alternate solutions.Discover patterns/root causes, and generate insights to drive product enhancements.Bring together disparate data sources to create a complete analysis.Analyze and evaluate the quality of data used for model training and testingCreate and present proposals and results in an intuitive, data-backed manner, alongwith actionable insights and recommendations to drive business decisions.Collaborate with other data scientists and engineers on data collection and featuredesign efforts across teams.Communicate results to diverse audiences through effective writing and datavisualizations (BI reports and Dashboards).Desired Skills Solid experience with Natural Language Processing (NLP).Text Extraction from various sources (MS Word, plain text files, pdf files,html pages, etc.), Text Cleaning, Text Pre-Processing, Tokenization, POStagging,NER,DependencyParsing,CoreferenceResolution,FeatureVectorGeneration (binary, count, tf-idf, etc.), word2vec, doc2vec, glove, RAKE,document similarity (Cosine, Jaccard, etc.),fuzzy text matching, LexicalandSemantic Information ExtractionUnderstanding of various NL constructs like Parts of Speech, Sentencestructures, Subject Verb Object relationships, word dependencies(ROOT, compound, etc.)Strong expertise in Python.Expert-level skills with packages like NLTK, spaCy, genism, Pattern,TextBlob,Vocabulary,StanfordCoreNLPPythonwrappers.Textextractiontools like PDFMiner, Apache Tika with Python, PyPDF2, etc. pandas,sklearn, numpy, xgboost, matplotlib, keras, etc.Expertise in Command Line usage (e.g., Bash), and SQLRobust knowledge of statistical modelling and machine learning techniquesTechniques: text clustering (k-means,