Chapter 11: Concept Identification—Advanced Tokenization

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

اصطلاحنامه مجموعه ها مرورالفبایی لغت نامه دهخدا

➟

جستجو در لغت نامه

بیشتر

کتابخانه شخصی پرسش از کتابدار ارسال منبع

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] - نسخه متنی

Jonathan A. Zdziarski

| ،

افزودن به کتابخانه شخصی

میخواهم بخوانم

درحال خواندن

خوانده شده

ارسال به دوستان

آدرس پست الکترونیک گیرنده :

آدرس پست الکترونیک فرستنده :

نام و نام خانوارگی فرستنده :

پیغام برای گیرنده ( حداکثر 250 حرف ) :

کد امنیتی را وارد نمایید

ارسال

جستجو در متن کتاب

بیشتر

تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب

➟

جستجو در لغت نامه

بیشتر

توضیحات

افزودن یادداشت جدید

Overview

Chapter 6, we discussed basic tokenization theory. Primitive tokenization allows us to identify messages based on the individual characteristics of each word, number, or other small component of a message. Content is more than just individual words like “free”; it is also a collection of concepts, like “free call” or “free software.” Statistical filters that perform the advanced tokenization discussed in this chapter are capable of identifying not only raw content but also individual concepts, the true goal of concept learning. Instead of identifying a spam as a message that contains the word “free,” implementing some of the concepts in this chapter allows us to identify spam content based on the concept of a “new exciting offer” or “playing lotto.” This approach to tokenization flows through to the rest of the filter and allows Bayesian content filters to act more like Bayesian concept filters— filters that eliminate spam based not just on content, but on the concept of the message. Concepts don’t necessarily have to be ones that are humanly understandable; they can be lexical concepts, such as grammatical pretense, word construction, and HTML generation.

Let’s go back to the notion of concept learning for a moment. Suppose that we have some children sitting at a table with someone who is showing them pictures of sports cars and sedans. They’re not only learning the concepts of what a sports car and a sedan are; they’re learning the individual concepts that make up the larger concepts. A Bayesian content filter using primitive tokenization learns only specific characteristics, such as “four round objects” and “rectangle extending out of top subsection,” whereas true concept learning would identify concepts such as “fat tires” and “engine coming out of hood.” Identifying concepts is truly where content filtering approaches a new level of AI, and that is what this chapter is all about.

Chapter 11: Concept Identification—Advanced Tokenization - Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] - نسخه متنی

Jonathan A. Zdziarski

آدرس پست الکترونیک گیرنده :

آدرس پست الکترونیک فرستنده :

نام و نام خانوارگی فرستنده :

پیغام برای گیرنده ( حداکثر 250 حرف ) :

کد امنیتی را وارد نمایید

فونت

اندازه قلم

حالت نمایش