An integrated system for building enterprise taxonomies

Li Zang; Tao Li; ShiXia Liu; Yue Pan
October 2007
Information Retrieval Journal;Oct2007, Vol. 10 Issue 4/5, p365
Academic Journal
Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical taxonomies. In our method, the category node is initially defined by some keywords, the web search engine is then used to construct a small set of labeled documents, and a topic tracking algorithm with keyword-based content normalization is applied to enlarge the training corpus on the basis of the seed documents. We also design a method to check the consistency of the collected corpus. The above steps produce a flat category structure which contains all the categories for building the hierarchical taxonomy. Next, linear discriminant projection approach is utilized to construct more meaningful intermediate levels of hierarchies in the generated flat set of categories. Experimental results show that the training corpus is good enough for statistical classification methods.


Related Articles

  • Assessing citations with Google Scholar: A new feature. Ahmed, K. K. Mueen // Journal of Pharmacology & Pharmacotherapeutics;Jan2012, Vol. 3 Issue 1, p75 

    The article discusses the launch of the Google Scholar search service which provides the ability to search for scholarly literature located from across the web. It states that Google Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an...

  • Characterizing Interdisciplinarity of Researchers and Research Topics Using Web Search Engines. Sayama, Hiroki; Akaishi, Jin // PLoS ONE;Jun2012, Vol. 7 Issue 6, p1 

    Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or coauthorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also...

  • Improving the Ranking Capability of the Hyperlink Based Search Engines Using Heuristic Approach. Ramadhan, Haider A.; Khalil Shihab; Ali, Jafar H. // Journal of Computer Science;2006, Vol. 2 Issue 8, p638 

    To evaluate the informative content of a Web page, the Web structure has to be carefully analyzed. Hyperlink analysis, which is capable of measuring the potential information contained in a Web page with respect to the Web space, is gaining more attention. The links to and from Web pages are an...

  • Search Engine SHOOT-OUT. Bertolucci, Jeff // PCWorld;Jun2007, Vol. 25 Issue 6, p86 

    The article compares the efficiency of Google's search engine with other search engines such as AlltheWeb and AltaVista. According to the author, Google's index proved to be the most accurate and comprehensive. Recent enhancements to Live Search's mobile component moved that service into the...

  • Looking for Answers in All the Wrong Places, or Possibly Some Correct Places. Ojala, Marydee // Online;May/Jun2008, Vol. 32 Issue 3, p29 

    The article features two articles about the idea of human search engines. The articles, one a decade old and the other written only a few months ago, show the progression of the intersection between people and search engines to deliver answers online. Neither article even touches on the notion...

  • Searching within. Goff, Clare // New Media Age;3/10/2005, p25 

    This article presents an update on the developments in the Web searching industry as of March 2005. The battle for supremacy in the Web search arena seemed to be reaching a conclusion, but now another search scram looks set to begin. Desktop search has become the latest application to find the...

  • BE A GOOGLE EXPERT. Caplan, Jeremy; Rothman, Wilson // Time;2/20/2006, Vol. 167 Issue 8, p43 

    This section offers quirky tips from Google. Google's improved search box provides quick, direct answers to many common questions. Tips and tricks are suggested. In many offices, Google has made traditional search tools such as the telephone book, calculator and dictionary seem obsolete. Expert...

  • FOIOTI: An implementation of the conceptualist approach to Internet Information Retrieval. Weideman, M. // South African Journal of Libraries & Information Science;2005, Vol. 71 Issue 1, p11 

    The objective of this research project was to evaluate searching methodologies used by undergraduate learners in searching for academic information, and to design an aid if required. Literature surveys indicated that the sheer size of the Internet and lack of categorization of the information...

  • Output-sensitive autocompletion search. Bast, Holger; Mortensen, Christian W.; Weber, Ingmar // Information Retrieval Journal;Aug2008, Vol. 11 Issue 4, p269 

    We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the best such hits. The following problem is at the core of...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics