아파치 그룹의 mahout 기계학습 라이브러리 책

아파치 그룹에서 추진하던 루씬(Lucene) 프로젝트의 하위 프로젝트로 2008년에 시작되었는데,(루씬 프로젝트는 오픈소스기반의 검색 엔진), 이 프로젝트를 위해서 기계학습 라이브러리가 필요하게 되었는데.. 이것을 개발하다가 따로 떨어지게 되면서 오픈소스 기반의 Taste 협업 필터링 프로젝트를 흡수하면서 아파치 그룹의 Top 레벨의 프로젝트로 2010년에 탄생하게 되었다고 한다.

Mahout began life in 2008 as a subproject of Apache’s Lucene project, which providesthe well-known open source search engine of the same name. Lucene provides advanced implementations of search, text mining, and information-retrieval techniques. In the universe of computer science, these concepts are adjacent to machine learning techniques like clustering and, to an extent, classification. As a result, some of the work of the Lucene committers that fell more into these machine learning areas was spun off into its own subproject. Soon after, Mahout absorbed the Taste open source collaborative filtering project.

As of April 2010, Mahout became a top-level Apache project in its own right....

<Mahout in Action 원문에서 발췌..>

이 내용을 작년에 듣기는 했는데, 큰 관심을 가지고 있지 않다가 최근에 책을 구매하게 되면서 한번 살펴보려고 했다. 그런데, 구글에서 검색하니깐 이것이 원서로는 pdf를 다운로드 받을 수 있다. 영어가 편하신 분들은 이 pdf를 구글에서 검색하면 쉽게 다운 받을 수 있다... 혹시나 하여 여기에도 올려둔다.(10M가 넘기때문에 압축함)

[Mahout.in.Action(2011)].Sean.Owen.zip

이 책에 포함되어 있는 알고리즘들의 리스트이다. (https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms 참고)

Classification

A general introduction to the most common text classification algorithms can be found at Google Answers: http://answers.google.com/answers/main?cmd=threadview&id=225316 For information on the algorithms implemented in Mahout (or scheduled for implementation) please visit the following pages.

Logistic Regression (SGD)

Bayesian

Support Vector Machines (SVM) (open: MAHOUT-14, MAHOUT-232 and MAHOUT-334)

Perceptron and Winnow (open: MAHOUT-85)

Neural Network (open, but MAHOUT-228 might help)

Random Forests (integrated - MAHOUT-122, MAHOUT-140, MAHOUT-145)

Restricted Boltzmann Machines (open, MAHOUT-375, GSOC2010)

Online Passive Aggressive (integrated, MAHOUT-702)

Boosting (awaiting patch commit, MAHOUT-716)

Hidden Markov Models (HMM) (MAHOUT-627, MAHOUT-396, MAHOUT-734) - Training is done in Map-Reduce

Clustering

Reference Reading

Canopy Clustering (MAHOUT-3 - integrated)

K-Means Clustering (MAHOUT-5 - integrated)

Fuzzy K-Means (MAHOUT-74 - integrated)

Expectation Maximization (EM) (MAHOUT-28)

Mean Shift Clustering (MAHOUT-15 - integrated)

Hierarchical Clustering (MAHOUT-19)

Dirichlet Process Clustering (MAHOUT-30 - integrated)

Latent Dirichlet Allocation (MAHOUT-123 - integrated)

Spectral Clustering (MAHOUT-363 - integrated)

Minhash Clustering (MAHOUT-344 - integrated)

Top Down Clustering (MAHOUT-843 - integrated)

Pattern Mining

Parallel FP Growth Algorithm (Also known as Frequent Itemset mining)

Regression

Locally Weighted Linear Regression (open)

Dimension reduction

Singular Value Decomposition and other Dimension Reduction Techniques (available since 0.3)

Stochastic Singular Value Decomposition with PCA workflow (PCA workflow now integrated)

Principal Components Analysis (PCA) (open)

Independent Component Analysis (open)

Gaussian Discriminative Analysis (GDA) (open)

Evolutionary Algorithms

NOTE: * Watchmaker support has been removed as of 0.7

'Machine Learning > Memo' 카테고리의 다른 글

KBS1 생각의 집, '인공지능과 인류의 미래' (0)	2015.05.13
기계학습 튜토리얼 (0)	2013.01.27
빅데이터에 대한 오해 세가지 (0)	2013.01.03
Biomathematics 'Bayesian Models of Brain and Behaviour' (0)	2012.11.05
Random number generation using C++ TR1 (0)	2012.07.12

Always be creative

아파치 그룹의 mahout 기계학습 라이브러리 책

'Machine Learning > Memo' 카테고리의 다른 글

티스토리툴바

아파치 그룹의 mahout 기계학습 라이브러리 책

'Machine Learning > Memo' 카테고리의 다른 글

'Machine Learning/Memo' Related Articles

티스토리툴바