본문 바로가기

Robotics/Software Tech.

텍스트 마이닝(Text Mining)


텍스트 마이닝(Text Mining) 이란 비정형 텍스트 데이터에서 정보를 찾아내는 기술이다. 이 기술은 자연어처리에 기반하고 있는데, 자연어라는 것은 인간이 사용하는 언어인데, 자연어처리가 필요한 이유는 인간이 쓴는 자연어를 컴퓨터에서 이해시키기 위해서 필요한 처리를 말한다.

아래는 공개 또는 상업용 text mining program들이다.
 

Commercial Text Mining / Text Analytics Software

  • ActivePoint, offering natural language processing and smart online catalogues, based contextual search and ActivePoint's TX5(TM) Discovery Engine.
  • Aiaioo Labs, offering distributed corpus annotation tools and services for use in machine learning, relation and event extraction, transliteration, part-of-speech tagging and classification. Aiaioo online demo .
  • Alceste, a software for the automatic analysis of textual data (open questions, literature, articles, etc.)
  • Attensity, offers a complete suite of Text Analytic applications, including the ability to extract "who", "what", "where", "when" and "why" facts and then drill down to understand people, places and events and how they are related.
  • Basis Technology, provides natural language processing technology for the analysis of unstructured multilingual text.
  • Clarabridge, text mining software providing end-to-end solution for customer experience professionals wishing to transform customer feedback for marketing, service and product improvements.
  • ClearForest, tools for analysis and visualization of your document collection. 
  • Compare Suite, compares texts by keywords, highlights common and unique keywords.
  • Connexor Machinese, discovers the grammatical and semantic information of natural language.
  • Copernic Summarizer, can read and summarize document and Web page text contents in many languages from various applications
  • Crossminder, natural language processing and text analytics (including cross-lingual text mining).
  • Dhiti, providing an API for text-mining; can work on a document collection and mine out topics and concepts in realtime.
  • DiscoverText, a powerful and easy-to-use set of text analytic solutions for eDiscovery and research.
  • dtSearch, for indexing, searching, and retrieving free-form text files.
  • Eaagle text mining software, enables you to rapidly analyze large volumes of unstructured text, create reports and easily communicate your findings.
  • Enkata, providing a range of enterprise-level solutions for text analysis.
  • Entrieva, patented technology indexes, categorizes and organizes unstructured text from virtually any source.
  • Expert System, using proprietary COGITO platform for the semantic comprehension of the language to do knowledge management of unstructured information.
  • Files Search Assistant, quick and efficient search within text documents.
  • IBM Intelligent Miner Data Mining Suite, now fully integrated into the IBM InfoSphere Warehouse software; includes Data and Text mining tools (based on UIMA).
  • Intellexer, natural language searching technologies for developing knowledge management tools, document comparison software and document summarization software, custom built search engines and other intelligent software.
  • ISYS Search Software, an enterprise search software supplier specializing in embedded search, text extraction, federated access solutions and text analytics.
  • IxReveal, offering uReveal "plug-in" advanced analytic platform and uReka! desktop "search and analyze" consumer product, based on patented text analytics methods.
  • Kwalitan 5 for Windows, uses codes for text fragments to faciliate textual search, display overviews, build hierarchical trees and more.
  • KXEN Text Coder (KTC), text analytics solution for automatically preparing and transforming unstructured text attributes into a structured representation for use in KXEN Analytic Framework.
  • Langsoft question-answering and content recognition/text attribution software, evaluation copy available.
  • Lexalytics, provides enterprise and hosted text analytics software to transform unstructured text into structured data.
  • Leximancer, makes automatic concept maps of text data collections
  • Lextek Onix Toolkit, for adding high performance full-text indexing search and retrieval to applications.
  • Lextek Profiling Engine, for automatically classifying, routing, and filtering electronic text according to user defined profiles.
  • Linguamatics, offering Natural language processing (NLP), search engine approach, intuitive reporting, and domain knowledge plug-in.
  • Megaputer Text Analyst, offers semantic analysis of free-form texts, summarization, clustering, navigation, and natural language retrieval with search dynamic refocusing.
  • Monarch, data access and analysis tool that lets you transform any report into a live database.
  • NewsFeed Researcher, presents live multi-document summarization tool, with automatically-generated RSS news feeds.
  • Nstein, Enterprise Search and Information Access Technologies; On your public website, Nstein will guide your customers to the most relevant information more quickly than other solutions.
  • Odin Text, actionable DIY Text Analytics, with a focus on market research.
  • Power Text Solutions, extensive capabilities for "free text" analysis, offering commercial products and custom applications.
  • Readability Studio, offers tools for determining text readability levels.
  • Recommind MindServer, uses PLSA (Probablistic Latent Semantic Analysis) for accurate retrieval and categorization of texts.
  • SAS Text Miner, provides a rich suite of text processing and analysis tools.
  • Semantex from Janya Inc., enterprise-class information extraction system, detecting entities, attributes, relationships and events.
  • SPSS LexiQuest, for accessing, managing and retrieving textual information; integrated with SPSS Clementine data mining suite.
  • SPSS Text Mining for Clementine enables you to extract key concepts, sentiments, and relationships from call center notes, blogs, emails and other unstructured data, and convert it to structured format for predictive modeling.
  • SWAPit, Fraunhofer-FIT's text- and data analysis tool (updated version of DocMINER), offers visual text mining and retrieval capabilities, including search, term statistics, and summary; visualises semantic relationships among text documents.
  • TEMIS Luxid®, an Information Discovery solution serving the Information Intelligence needs of business corporations.
  • TeSSI®, software components that perform semantic indexing, semantic searching, coding and information extraction on biomedical literature.
  • Texifter, streamlines the process of sorting large amounts of unstructured textm with The Public Comment Analysis Toolkit (PCAT), DiscoverText and Sifter, off-the-shelf, enterprise-class business process applications.
  • Text Analysis Info, offering software and links for Text Analysis and more
  • Textalyser, online text analysis tool, providing detailed text statistics
  • TextPipe Pro, text conversion, extraction and manipulation workbench.
  • TextQuest, text analysis software
  • Readware Information Processor for Intranets and the Internet, classifies documents by content; provides literal and conceptual search; includes a ConceptBase with English, French or German lexicons.
  • Quenza, automatically extracts entities and cross references from free text documents and builds a database for subsequent analysis.
  • VantagePoint provides a variety of interactive graphical views and analysis tools with powerful capabilities to discover knowledge from text databases.
  • VisualText™, by TextAI is a comprehensive GUI development environment for quickly building accurate text analyzers.
  • Xanalys Indexer, an information extraction and data mining library aimed at extracting entities, and particularly the relationships between them, from plain text.
  • Wordstat, analysis module for textual information such as responses to open-ended questions, interviews, etc.
Many packages above offer free or limited trial versions.

Free and Open-Source Text Mining / Text Analytics Software

  • GATE, a leading open-source toolkit for Text Mining, with a free open source framework (or SDK) and graphical development environment.
  • INTEXT, MS-DOS version of TextQuest, in public domain since Jan 2, 2003.
  • LingPipe is a suite of Java libraries for the linguistic analysis of human language.
  • Open Calais, an open-source toolkit for including semantic functionality within your blog, content management system, website or application.
  • RapidMiner Text Mining.
  • new ReVerb: Open Information Extraction Software, extracts binary relationships like high-in(winter squash, vitamin c) without requiring any relation-specific training data.
  • S-EM (Spy-EM), a text classification system that learns from positive and unlabeled examples.
  • The Semantic Indexing Project, offering open source tools, including Semantic Engine - a standalone indexer/search application.

On-line Text Mining / Text Analytics Tools

  • Ranks.nl, keyword analysis and webmaster tools.
  • Vivisimo/Clusty web search and text clustering engine.
  • Wordle, a tool for generating "word clouds" from text that you provide