Data Science

With the massive data amount that can be generated and collected nowadays – so called big data, it demands a highly advanced skills set to analyze the data and derive new solutions for a vast number of applications.

To become a data scientist means to be “better at statistics than any software engineer and better at software engineering than any statistician,”  according to Josh Wills.

What is Big Data & Data Science?

Common Techniques and Technologies

Below is a short (uncompleted) list of common skills required for undertaking data science tasks:

  • Statistics, Probabilities, Statistical Inference, Statistical Modeling, and Data Visualization
  • Mathematics, Discrete Math, Calculus, Linear Algebra, Numerical Analysis, and Algorithm
  • Machine Learning
  • Data Mining
    • The study of automatic sophisticated process that discover data patterns for segmentation and prediction
    • Free tools like WEKARapidMinerKNIMENLTK, Orange, and Apache Mahout.
    • Popular commercial tools like Microsoft  Azure, IBM SPSS Modeler, Rattle, MATLAB, SAS Enterprise Miner…
  • Could platform like AWS, Azure, Cloudera
  • Hadoop, HBase, Hive, Pig, Spark… 
  • Programming languages like R, Python, Java (incl. Data Structures)…
  • Structure Query Language (in general); relational vs. non-relational database
  • Other statistical softwares like SAS, SPSS, STATA, Tableau, and QlikView
  • And soft skills like Domain Knowledge, Communication, Cooperation, Management, Creativity, Curiosity, and Ethics

Other Resources for Data Science

Community Support