Assembly Language StepbyStep Programming with DOS and Linux 2nd Ed [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

The Notion of an Assembly Language String

Words fail us sometimes by picking up meanings as readily as a magnet picks up iron filings. The word string is a major offender here. It means roughly the same thing in all computer programming, but there is a multitude of small variations on that single theme. If you learned about strings in Turbo Pascal, you'll find that what you know isn't totally applicable when you program in C, or Basic, or assembly.

So here's the Big View: A string is any contiguous group of bytes, of any arbitrary size up to the size of a segment. The main concept of a string is that its component bytes are right there in a row, with no interruptions.

That's pretty fundamental. Most higher-level languages build on the string concept in several ways. Turbo Pascal treats strings as a separate data type, limited to 255 characters in length, with a single byte at the start of the string to indicate how many bytes are in the string. In C, a string may be longer than 255 bytes, and it has no length byte in front of it. Instead, a C string is said to end when a byte with a binary value of 0 is encountered. In Basic, strings are stored in something called string space, which has a lot of built-in code machinery associated with it.

When you begin working in assembly, you have to give up all that high-level language stuff. Assembly strings are just contiguous regions of memory. They start at some specified address, go for some number of bytes, and stop. There is no length byte to tell how many bytes are in the string and no standard boundary characters such as binary 0 to indicate where a string starts or ends.

You can certainly write assembly language routines that allocate Turbo Pascal-style strings or C-style strings and manipulate them. To avoid confusion, however, you must think of the data operated on by your routines to be Pascal or C strings rather than assembly strings.

Turning Your "String Sense" Inside-Out

Assembly strings have no boundary values or length indicators. They can contain any value at all, including binary 0. In fact, you really have to stop thinking of strings in terms of specific regions in memory. You should instead think of strings in much the same way you think of segments: in terms of the register values that define them.

It's slightly inside-out compared to how you think of strings in such languages as Pascal, but it works: You've got a string when you set up a pair of registers to point to one (or a single register, if you're working in real mode or protected mode flat model). And once you point to a string, the length of that string is defined by the value you place in register CX.

This is key: Assembly strings are wholly defined by values you place in registers. There is a set of assumptions about strings and registers baked into the silicon of the CPU. When you execute one of the string instructions (as I describe a little later), the CPU uses those assumptions to determine which area of memory it reads from or writes to.

Source Strings and Destination Strings

There are two kinds of strings in assembly work. Source strings are strings that you read from. Destination strings are strings that you write to. The difference between the two is only a matter of registers; source strings and destination strings can overlap. In fact, the very same region of memory can be both a source string and a destination string, all at the same time.

Here are the assumptions the CPU makes about strings when it executes a string instruction:

A source string is pointed to by DS:SI.

A destination string is pointed to by ES:DI.

The length of both kinds of strings is the value you place in CX.

Data coming from a source string or going to a destination string must pass through register AX.

Note that the use of segment registers mostly applies to real mode segmented model. In real mode flat model, as you should know by now, all the segment registers contain the same value, and are therefore basically factored out of consideration for many things, string work included. (The same is true of protected mode flat model, as you'll learn in later chapters.) The CPU can recognize both a source string and a destination string simultaneously, because DS:SI and ES:DI can hold values independent of one another. However, because there is only one CX register, the length of source and destination strings must be identical when they are used simultaneously, as in copying a source string to a destination string.

One way to remember the difference between source strings and destination strings is by their offset registers. SI means "source index," and DI means "destination index."

Assembly Language StepbyStep Programming with DOS and Linux 2nd Ed [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی