Sorry, this page is locked for editing due to heavy traffic and edit volume.
Python Modules for Data Science & Analytics
A collection of important python modules for data scientists
This is a part of Python Knowledge and Resources List
Pandas is a library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.
Installing pandas and the rest of the NumPy and SciPy stack can be a little difficult for inexperienced users.
The easiest way to install pandas is to install it as part of the Anaconda distribution.
pandas can be installed via pip from PyPI.
pip install pandas
This will likely require the installation of a number of dependencies, including NumPy, will require a compiler to compile required bits of code, and can take a few minutes to complete.
Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
You can obtain source distributions and Windows binaries from PyPi. Alternatively, you can use setuptools to install statsmodels:
or upgrade with:
easy_install -U statsmodels
Statsmodels can be installed from source the usual way with the command
python setup.py install
scikit-learn is an open source library for the Python. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
At this time scikit-learn does not provide official binary packages for Linux so you have to build from source.
Installing from source requires you to have installed the scikit-learn runtime dependencies, Python development headers and a working C/C++ compiler. Under Debian-based operating systems, which include Ubuntu, if you have Python 2 you can install all these requirements by issuing:
sudo apt-get install build-essential python-dev python-setuptools \
python-numpy python-scipy \
Mlpy is a Python machine learning library built on top of NumPy/SciPy, the GNU Scientific Library. mlpy provides a wide range of machine learning methods for supervised and unsupervised problem.mlpy is multi platform, it works with Python 2 and 3.
Download latest version for your OS from http://sourceforge.net/projects/mlpy/files/
you need GCC,Python,Numpy,SciPy,GSL preinstalled
then, from the terminal run
python setup.py install
NumPy is an open source extension module for Python. The module NumPy provides fast precompiled functions for numerical routines.
It adds support to Python for large, multi-dimensional arrays and matrices. Besides that it supplies a large library of high-level mathematical functions to operate on these arrays
Most of the major linux distributions provide packages for NumPy, but these can lag behind the most recent NumPy release. Pre-built binary packages for Ubuntu are available on the scipy ppa. Redhat binaries are available in the Enthought Canopy.
SciPy is widely used in scientific and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.
Users on Linux can quickly install the necessary packages from repositories.
for example ubuntu users can install dependencied by runnung
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
matplotlib is a plotting library for NumPy.
sudo apt-get install python-matplotlib
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs statistical natural language processing (NLP) for the Python. NLTK includes graphical demonstrations and sample data.NLTK has been used successfully as a platform for prototyping and building research systems
sudo pip install -U nump
sudo pip install -U nltk
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
PyBrainPyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network LibraryIts goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
OrangeOrange is a component-based data mining and machine learning software suite, featuring a visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It includes a set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is implemented in C++ and Python. Its graphical user interface builds upon the cross-platform Qt frameworkUnlike its competitors scikit-learn and mlpy, Orange does not tie into NumPy and its ecosystem of tools; it focuses on traditional, symbolic algorithms, more than numeric onesjjj
KerasKeras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.
HebelHebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.