How To Cog - Chapter 3.8 - Parsing

Parsing is a term used to describe how computers read data from a file. Typically, a program will open the file, and then read it line by line or character by character into memory. Sometimes the program won't need to store the file, so it will drop one line as soon as it reads the next. Most parsers are pretty standard, and this page will show you how they work.

Most parsers will ignore extra whitespace (spaces, tabs, and CRLFs - which are a combination of a carriage return and a line feed). JK is very strict about whitespace until it gets to the code section. You can only have one symbol declaraction per line, whereas you can have as many statements as you want on one line.

Parsers work by looking for keywords. They seperate keywords with delimiters - these are special characters like operators, colons, and whitespace. So from any location in the file, the parser will be looking at the string of characters from its current position to the next delimiter. If this string matches a keyword that the parser is looking for, then depending on what keyword it is, the parser will look for whatever is supposed to come after it.

Header & Flags

The header is just comments. As soon as the parser sees a pound sign, it ignores the rest of the line (that is, everything up to the next CRLF) - it doesn't matter what comes after it.

At this point, JK is looking for two things: the flags keyword or the symbols keyword. So JK is going to look at the beginning of each line for these words. If there's whitespace before the keyword, JK will look for the keyword after the space.

Once the parser finds the flags keyword, it's going to look for the assignment afterwords. In most cases, spaces can be put in anywhere, but JK is very picky about spaces in the flags assigment. If it finds any, it will consider it a syntax error because it doesn't expect them.

This is probably because JK doesn't care about making the flags line work like a real assignment. It's searching for "flags=" and taking the text from the seventh character to the last non-space character as the assigned number. Once it sees that the number's not valid, it stops reading the cog.

Symbols Parsing

After the flags, JK's parser looks for the symbols keyword. The parser will go line after line until it reaches the end of the file looking for this keyword. If it doesn't find it, there's a syntax error and the parser stops.

When it does find symbols, the parser now has a new list of keywords to look for. These are the symbol type names and the "end" for the end of the section. If it finds something else, it will probably try to ignore it and go on to the next line, but if it can't (perhaps you forgot the end keyword) it will either reach the end of the cog or assume there's been a syntax error.

Once the parser sees a symbol type (e.g., message), it's going to look for the name and any extensions which might come after it. The only delimiters the parser is allowing here are spaces.

Code Parsing

After finding the end of the symbols section, the parser will be looking only for the code keyword. Anything else has to be a syntax error. But once inside the code section, the parser acts much differently. It will first be expecting text followed by a colon, but once it finds that first label and until it finds the end of the code section, it won't be looking for a few specific keywords.

JK most likely takes in the first word and then looks at the next non-whitespace delimiter. If it's an equals sign, then the parser knows this is an assignment and the string that came before it must be a variable name. If it were doing syntax checking, it would have stored all of the variable names read in the symbols section and it would know if this variable has been declared or not.

After the equals sign, the parser is expecting something that returns a value. This could be a function, a variable, a number, or a math operation. The parser will read characters until it finds a delimiter it's looking for. If it finds some text followed by an opening parenthesis, this must be a function, so the parser will look up the function and check for the parameters it's supposed to have. The parser could check the type of the assigned variable to make sure that it matches the return type of the function.

If the parser finds a delimiter that's a math symbol, it knows there's an operation that needs to be performed and more expressions could follow it. Almost any type of code can be put somewhere into an assignment. The parser could check to make sure all of the variables are of a type compatible with that operation. For example, the parser would catch a syntax error if you were multiplying a string and a float.

A parser will usually look to the next delimiter to find out what the previous string is. Statements are always ended with a semicolon so that the parser will know that it's just read a complete statement. After the semicolon, parsers will be expecting either the beginning of another statement, the end of the line, or the end keyword.

JK's parser goes through all of the code like this until it finds an end for the code section. At this point it stops because it doesn't need to find anything after the code section.

Previous: Flags Up: How To Cog Next: Timers and Pulses
  • Create:
This page was last modified 16:27, 14 April 2006.   This page has been accessed 1,447 times.