TITLE

Joint Web-Feature (JFEAT): A Novel Web Page Classification Framework

AUTHOR(S)
Han, Lim Wern; Alhashmi, Saadat M.
PUB. DATE
December 2010
SOURCE
Communications of the IBIMA;Dec2010, p1
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
With the increasing amount of web pages over the internet, it has been a major concern to obtain information on the internet accurately at a reasonable cost with decent performance. A potential solution is through the classification of web pages into meaningful categories. An effective classification of web pages is of benefit to various applications such as web mining and search engines. Unlike text documents, the nature of web pages limits the performance of successful traditional pure-text classification methods. Noises exist in the form of HTML tags, multimedia contents, dynamic contents and the network structure of web pages which requires a deeper look into effective feature selection of web pages. Often, these features are filtered out relying on the displayed texts of the web page for classification. This paper proposed a framework where web page features are taken into consideration during classification of the web page due to the potential valuable information that might be stored within each of the features. For this reason, this paper explores the potential of the universal Resource Locator (URL), web page title as well as the metadata for information to be used in classification with various categories defined by the users. The framework then explores suitable machine learning algorithms for individual classification of each web feature. The results would then be used for weighted voting to obtain the classification of that webpage. This approach showed improvements over pure-text as well as virtual-webpage classification approaches.
ACCESSION #
64935595

 

Related Articles

  • Web Crawling. Olston, Christopher; Najork, Marc // Foundations & Trends in Information Retrieval;2010, Vol. 4 Issue 3, p175 

    This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical...

  • A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION. BUFNEA, DARIUS; HALIŢĂ, DIANA // Studia Universitatis Babes-Bolyai, Informatica;Sep2013, Vol. 58 Issue 3, p78 

    The migration process of a website's content within a Content Management System almost always implies changes in the site structure as seen by search engines and web clients. This variation leads to somedis advantages, such as misdirecting search engines visitors to old, unavailable, URLs. Even...

  • Where You At? Shafer, Todd // Dealernews;Oct2007, Vol. 43 Issue 10, p32 

    The article offers tips for dealers in the U.S. on how to make their web sites easy to find. Dealers are advised to name their images with search engine optimization (SEO) in mind. They can use the extra Hypertext Markup Language (HTML) attributes. Dealers need to ensure that the first five to...

  • Driving Traffic to Your Web Site. Elges, Mary // Nonprofit World;Nov/Dec2002, Vol. 20 Issue 6, p15 

    The article discusses what measures to take for making a Web site visitor friendly. First, the Web site should be listed on the search engines that direct Internet traffic to the site. The World Wide Web is constantly being updated and indexed by search engines, using programs called spiders and...

  • Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition. Ito, Akinori; Kajiura, Yasutomo; Suzuki, Motoyuki; Makino, Shozo // EURASIP Journal on Audio Speech & Music Processing;2009, Vol. 2009, Special section p1 

    We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to...

  • On Schema.org and Why It Matters for the Web. Mika, Peter // IEEE Internet Computing;Jul2015, Vol. 19 Issue 4, p52 

    How do we keep the Web searchable as it expands and evolves?

  • Pearls. Dowling, Thomas; Marmion, Dan // Information Technology & Libraries;Mar2000, Vol. 19 Issue 1, p53 

    Responds to a query that asked if web site developers should assume the standard monitor resolution of 640x480 pixels, or something else. HTML; Recommendations.

  • Building your Web site: HTML basics. Hoffman, Leslie; Frenza, JP // Nonprofit World;May/Jun98, Vol. 16 Issue 3, p22 

    Discusses the basics of producing an organization's own Web site. Information on Hypertext Markup Language (HTML); Description of HTML; How HTML works; Importance of HTML: Instructions in creating a basic HTML file.

  • Online Evolution.  // Technology Review;Nov/Dec2010, Vol. 113 Issue 6, p12 

    The article presents the author's insights regarding the development of Hypertext Markup Language 5 (HTML5) by the Worldwide Wide Web Consortium (W3C) programmers.

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics