An Imperfect Solution - Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] - نسخه متنی

Jonathan A. Zdziarski

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
توضیحات
افزودن یادداشت جدید







An Imperfect Solution


Language classification is imperfect—a small margin of error will always remain. Since language classifiers learn based on historical decisions, they are always going to make decisions based only on what they already know. In other words, it is impossible to design a perfect algorithm to solve the problem of language classification, because decisions are based only on historical information (namely, what the classifier knows so far about the recipient). Because email evolves, language classifiers must act to some degree as a crystal ball to predict how the user will classify new messages.

From a philosophical perspective, language classification is more of an art form than a science. Instead of approaching it with the idea that it can be made perfect, you will save a considerable amount of time when you realize that the process is imperfect; the goal should be to design and implement a system that is “good enough,” with practical resources in mind.

This is the same philosophy used to find square roots. The term “good enough” as we use it here denotes the ideal balance between accuracy and system resources, making use of software that is practical in a wide range of scalable environments.

/ 151