Practical Web Spam Lifelong Machine Learning System with Automatic Adjustment to Current Lifecycle Phase

Luckner, Marcin
February 2019
Security & Communication Networks;2/20/2019, p1
Academic Journal
Machine learning techniques are a standard approach in spam detection. Their quality depends on the quality of the learning set, and when the set is out of date, the quality of classification falls rapidly. The most popular public web spam dataset that can be used to train a spam detector—WEBSPAM-UK2007—is over ten years old. Therefore, there is a place for a lifelong machine learning system that can replace the detectors based on a static learning set. In this paper, we propose a novel web spam recognition system. The system automatically rebuilds the learning set to avoid classification based on outdated data. Using a built-in automatic selection of the active classifier the system very quickly attains productive accuracy despite a limited learning set. Moreover, the system automatically rebuilds the learning set using external data from spam traps and popular web services. A test on real data from Quora, Reddit, and Stack Overflow proved the high recognition quality. Both the obtained average accuracy and the F-measure were 0.98 and 0.96 for semiautomatic and full–automatic mode, respectively.


Related Articles

  • AITP on-line. Oriez, Charlie // Information Executive;Nov/Dec99, Vol. 3 Issue 11/12, p6 

    Presents tips on how to avoid spams or unsolicited e-mail. Includes use of high-level domain; Options for blocking spam at the servers; Places spammers use to harvest e-mail addresses.

  • E-Mail Spam Detection Using SVM and RBF. Sharma, Reena; Kaur, Gurjot // International Journal of Modern Education & Computer Science;Apr2016, Vol. 8 Issue 4, p57 

    In today's life internet is an important part. We spend most of our time on internet. One of the important features of internet is communication. Email is a mode of communication which is used for the personal and business purpose. Spam emails are the emails recipient does not wish to take...

  • FILTER. Macsai, Dan // Fast Company;Jul/Aug2010, Issue 147, p26 

    The article presents information on the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference to be held in Redmond, Washington on July 13, 2010.

  • A BAD RAP. Seda, Catherine // Entrepreneur;Jun2003, Vol. 31 Issue 6, p83 

    Provides tips and information in fighting back a spammer. Protection against the ordeal by not buying a software that harvests electronic-mail addresses; Avoidance of future confusion by informing the subscriber what will be found in the electronic-mail message; Suggestions not to be attracted...

  • Academic spam invitations are common and irritating, with 2.1 invitations received daily by each investigator.  // Medical Writing;Jun2017, Vol. 26 Issue 2, p64 

    The article reports that Academic spam invitations are common and are irritating with 2.1 invitations received daily by each investigator.

  • In search of a better spam solution. Goldsborough, Reid // Business Journal (Central New York);6/27/2008, Vol. 22 Issue 26, p22 

    The article offers tips on how to manage spam problems in e-mails in New York. Spam is described as harmful and pandemic in e-mails. Internet service providers use filters to block spam in subscribers emails. The appropriate ways to avoid the e-mail spam receive include never respond to...

  • The gated Net community. Tinnirello, Paul // eWeek;10/13/2003, Vol. 20 Issue 41, p46 

    The public streets and highways of the Internet have become like neighborhoods where it is no longer safe to venture. Corporate firewalls, spam filters and other Internet security measures are losing ground to the onslaught that has crippled some organizations' abilities to use the Internet as...

  • Antispam tools multiplying like spam. Fontana, John // Network World;2/17/2003, Vol. 20 Issue 7, p18 

    Reports on the increase in number of antispam technology products. Release of gateway product and upgrades to MailFrontier Inc.'s client software at Demo 2003 conference; Server edition of Sunbelt Software Corp.'s spam filtering software; Announcement of name and product of Q-Spam.

  • BARNEY'S RUBBLE. Barney, Doug // Network World;09/25/2000, Vol. 17 Issue 39, p7 

    Presents computer network-related news items, compiled in the week of September 25, 2000. Alliance of 15 online marketers to control unsolicited electronic mail messages; Electronic commerce shakeout predicted by business consultant Tom Peters.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics