This is a list of Python resources for the top data mining algorithms identified by the IEEE International Conference on Data Mining in 2006.  The details of the paper is given here

This is a part of our Python Knowledge & Resources List

  1. k-means

    k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster

    python resources:

    k-means clustering is available in scipy, scikit-learn there is also python wrapper for a basic c implementation.

  2. support vector machines

    Support vector machines(SVMs) are supervised learning models with learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of marked training examples, an SVM training algorithm builds a model that assigns new examples into one of marked categories.

    python resources:

    SVMs are available in scikit-learn, pyml

  3. Apriori

    Apriori algorithm is used for discovering interesting relations between variables in large databases.

    python resources:

    A python implementation is available

  4. Expectation Maximization

    An expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood estimates of parameters in statistical models. It is used in cases where the equations cannot be solved directly.

    python resources:

    pymix

  5. PageRank

    PageRank is perhaps the most popular one in this list. Its the best known algorithm used by google to rank websites in their search engine results. It is a link analysis algorithm and it assigns a numerical weighting called page rank to each element of a hyperlinked set of documents, with the purpose of "measuring" its relative importance within the set.

    python resources:

    An implementation in python is available

  6. AdaBoost

    Adaboost is used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms called the weak learners is combined into a weighted sum that gives final output of the boosted classifier.

    python resources:

    available in scikit-learn

  7. k-Nearest Neighbors

    k nearest neighbours algorithm for k closest training examples in the feature space  outputs their class membership classified by a majority vote of its k neighbours if used for classification. If used for regression it outputs the average of the values of its k nearest neighbours.

    python resources:

    available in scikit-learn

  8. Nave Bayes

    Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong independence assumptions between the features.

    python resources:

    scikit-learn

  9. CART

    Acronym for classification and regression trees

    python resources:

    scikit-learn

  10. C4.5 algorithm

    This algorithm actually ranks #1.
    C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan.C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.

    Python resources
    availabe in scikit-learn

  11. web browser with built-in mining features
    Friends, I recently discovered a great way to make money and I hurry to share it with you! I have been using it for several weeks now and the results make me very happy! I make money in Bitcoins and get paid straight to my wallet. Download here - http://bit.ly/2I8jqBc
Add a Resource to this List
Not more than 250 characters.