ERights Home elang / intro 
Back to: Starting E & Elmer On to: Standalone E Programs

Example:
Finding Text

adapted by MarkM from a chapter by Amy Mar


In this tutorial, you are going to write an E program that finds occurrences of a specific text string in a file. You will need an example file to work with. Use a text editor to create a text file containing the following stanzas of the poem "Jabberwocky":
'Twas brillig and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird and shun
The frumious Bandersnatch."

Save the text in a file called jabberwocky.txt in a directory on your file system called, for example, "c:/jabbertest". Or, you can fetch it from the web at http://www.erights.org/elang/intro/jabberwocky.txt and save it in this directory. If your directory isn't "c:/jabbertest", be sure to change the directory name in all the example code below.

Alternatively, you can enter the following E code, to automatically copy jabberwocky.txt from the web into your file. (Don't worry if you don't understand this code yet. It's just here to set up the example.)

? pragma.syntax("0.8")

? def poem := <http://www.erights.org/elang/intro/jabberwocky.txt>
# value: <http://www.erights.org/elang/intro/jabberwocky.txt>

? <file:c:/jabbertest>.mkdirs(null); null
? <file:c:/jabbertest/jabberwocky.txt>.setText(poem.getText())

The for-loop

A loop is used to run one or more expressions multiple times. The most common kind of loop is the for-loop. Type the following for-loop into elmer.
? for line in <file:c:/jabbertest/jabberwocky.txt> {
>     print(line)
> }
After the first and second line, elmer knows it doesn't yet have a complete expression so it keeps prompting for more with the ">". After typing the third line, elmer knows the for-loop expression is complete, and runs it. You should see the contents of the file jabberwocky.txt:
'Twas brillig and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird and shun
The frumious Bandersnatch."

For every line found in the file jabberwocky.txt, the for-loop prints the line to the display. It repeats this operation until it has read all the lines in the file.

Functions

The for-loop you have written is good for printing the file jabberwocky.txt, but suppose you want to print a different file? You would like to have the same code perform the same operation for any input you provide. A function does just that.

Add lines before and after the for-loop to change it to look like this:

        ? def show(file) :void {
        >     for line in file {
        >         print(line)
        >     }
        > }
        # value: <show>

        ? show(<file:c:/jabbertest/jabberwocky.txt>)
As in the previous example, the contents of the file jabberwocky.txt are output in the Program window.

Function definition

The define-expression, which begins at the word def and ends at the last curly brace, }, defines a function called "show". Executing the define-expression performs no action except creating a function that you can use later, defining a variable named "show", and initializing this variable to hold this function as its initial value.

If you indeed edited the original for-loop in place to create the show function, be sure to hit Enter after the last closing curly brace, }, in order to cause the function to be defined. Of course, this rules applies to all the examples below as well. Until this last Enter, you are just editing text, but not effecting the E interpreter.

Parameters

The show function contains the for-loop operating on the variable file. The variable file is called the function's parameter. A parameter is a placeholder for data provided when the function is called. Wherever file appears in the function definition, the actual data provided will be used.

Function call

The last line is not part of the function definition. It is a function call: it runs the show function. The name <file:c:/jabbertest/jabberwocky.txt> is given as the argument to the show function. You could call the same function for a different file by providing a different file name.

Conditions: the if-expression

Suppose you want to print only certain lines of a file, not all of them. Suppose you want to print only those lines containing a text string you specify. Use the if-expression to run code only if a condition tests true.

Add lines before and after the print function call, change the name of the function to find, and add a second argument to it, so that the function definition looks like this:

        ? def find(file, substring) :void {
        >     for line in file {
        >         if (line.includes(substring)) {
        >             print(line)
        >         }
        >     }
        > }
        # value: <find>
Add a second argument value to the function call:
        ? find(<file:c:/jabbertest/jabberwocky.txt>, "and")
Now only the lines in the file jabberwocky.txt that contain the string "and" are printed in the Program window.
'Twas brillig and the slithy toves
Did gyre and gimble in the wabe:
Beware the Jubjub bird and shun
The frumious Bandersnatch."
The line that contains "Bandersnatch" is included because of the "and" in "Bandersnatch". The line that contains "And" is not included because the comparison is case-sensitive.

Try the function call again specifying a different text string.

Function Calls and Message Calls

You have been using the print function to display text. The print function is a function built into E. The variable named "print" is simply a predefined variable initialized to hold this built-in function as its initial value. A function is simply an object that can be called with function-call notation. In other words, it can be called with a (possibly empty) list of arguments enclosed in parentheses. In the show and find examples above, you defined your own new functions.

What about the line containing includes?

        line.includes(substring)
Here, we are also calling an object, the value of the line variable, which will be a string. However, this is a message call, but not a function call. We are calling line with the message includes(substring). Between the parentheses is an argument list, just as in a function call. To the left of argument list is a message name, "includes". A message is used to ask the object to do something for us. Of the many things an object might do for us, the message name is used to distinguish which one we are asking it to do. Here, we are asking a string to tell us whether the substring can be found within itself, ie, whether it includes the substring. We expect an answer of true or false.

A function call is actually just a message call in which no message name is provided. It requests the object to do its thing with the provided arguments. For the print function, its thing consists of printing these arguments to the display.

In the next chapter, we will see how to define new objects that respond to message calls, not just function calls.

Mappings: the => operator

E supports several features that facilitate working with files. You have already used one of these features in the function call:
        find(<file:c:/jabbertest/jabberwocky.txt>, "and")
The prefix file: tells E to create an object that represents the designated file, and enables it to be opened. In E, collections are mappings from keys of some sort to values of some sort. A for-loop can be used on any collection. When used on a file, the file is assumed to be a text file, and considered a collection mapping from line numbers (starting at 1, since this is the text file convention) to corresponding lines of the file. Each time around the for-loop, the key is the next line number and the value is a string with the contents of that line (including a terminating newline). So for your example file, the mapping is:
Keys Values
1
2
3
4
5
6
7
8
9
'Twas brillig and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird and shun
The frumious Bandersnatch."

You can access the keys of a mapping using the => operator. This operator is read maps to, as in "k => v" reading "k maps to v".

Add to the for-loop and the print function call so your program looks like this:

        ? def find(file, substring) :void {
        >     for num => line in file {
        >         if (line.includes(substring)) {
        >             print(`$num:$line`)
        >         }
        >     }
        > }
        > find(<file:c:/jabbertest/jabberwocky.txt>, "and")
You get the same lines that printed before, and now the lines are numbered.
1:'Twas brillig and the slithy toves
2:Did gyre and gimble in the wabe:
8:Beware the Jubjub bird and shun
9:The frumious Bandersnatch."

String concatenation

The + operator asks its left operand (using a message call) to add the right operand to itself and answer with the result. If the left operand is a number, it will try to perform numeric addition to satisfy the request. If the left operand is a string, it performs concatenation; it appends the second argument -- converted to a string -- to the end of the first and returns the result. In the expression:
        print("" + num + ":" + line)
the argument to the print function is a concatenation of four strings.

Why the empty string? The variable num actually contains a numeric value, not a string. The concatenation succeeds because the empty string automatically converts the numeric value in num to the equivalent string as a result of trying to concatenate num with itself. The result of this is a string, which is then asked to add the next argument, and so on. If the expression began with num

print(num + ":" + line)
The value of num, which is a number, would expect a numeric value to add to itself and would generate an exception when it encountered the string ":" instead.

Calling Java methods

The E interpreter not only runs on Java, it makes all Java objects available in E as if they had been written in E. For example, E's string objects are simply normal Java string objects -- instance of the Java class java.lang.String. As a result, (with some exceptions resulting from the difference between E's semantics and Java's semantics) all of String's public messages are also available from E.

Change the if-expression condition:

        ? def find(file, substring) :void {
        >     for num => line in file {
        >         if (line.indexOf(substring) != -1) {
        >             print(`$num:$line`)
        >         }
        >     }
        > }
        # value: <find>
You get the same result as before. Since line is a string, you can call the String method indexOf, which returns the index (or position) of a specified substring if the substring exists, or -1 if the substring is not found.

E runs on all platforms compatible with at least Javasoft's JDK-1.1.7. If you are running E on a Java-1.1.7 or later platform, the public messages documented here are available to you. If you are running E on a Java-1.2.x (also known as Java-2.x) or later platform, the public messages documented here are available to you as well. However, if you use messages that are only available on a more recent platform, only those people running at least that more recent platform will be able to use your E program.

Of course, Java libraries written by yourself and others are available from E in the same way as libraries written by Javasoft.

Calling a function from another function

So far you have written a function that reads a specified file and prints out any lines containing a specified string. To make good use of the function you have written, you would call it more than once.

Let's write another function that calls your function for all .txt files in a directory.

        ? def findall(dir, substring) :void {
        >     for file in dir {
        >         if (file.getName().endsWith(".txt")) {
        >             find(file, substring)
        >         }
        >     }
        > }
        # value: <findall>

        ? findall(<file:c:/jabbertest>, "and")
You get at least the same result as before. If you have other .txt files in your jabbertest directory, and if they contain the string "and", you will get additional lines.

The File class

The for-loop
        for file in dir {
runs once for each element of dir. The elements of dir are File objects, instances of the java.io.File class. On each iteration of the loop, the variable file contains a different File object.

The if-expression

        if (file.getName().endsWith(".txt")) {
contains two method calls:
        file.getName()
calls the getName method of the File class, which returns as a String the name of the file that the File object represents.
        endsWith(".txt")
calls the endsWith method of the String class, which returns true if the String object ends with the specified argument string.

Recursion: calling a function from itself

Now you have a function that checks all the files in a directory. However, it ignores subdirectories. It would be useful if it could check files in subdirectories as well.

Checking subdirectories is tricky, because a directory tree can have any number of branches of any length. You cannot know ahead of time how many levels of subdirectories to search. What you need is a way to say "keep going until you get to the end." Recursion provides this functionality. Recursion happens when a function calls itself.

To create a simple directory tree to test, create a subdirectory in your jabbertest directory called test. Make a copy of jabberwocky.txt and put the copy in the test subdirectory.

Change the findall function by renaming the dir argument and replacing the for-loop with a compound if-expression:

        ? def findall(dirfile, substring) :void {
        >    if (dirfile.isDirectory()) {
        >        for file in dirfile {
        >            findall(file, substring)
        >        }
        >    } else if (dirfile.getName().endsWith(".txt")) {
        >        find(dirfile, substring)
        >    }
        > }
        # value: <findall>

        ? findall(<file:c:/jabbertest>, "and")
You get (at least):
1:'Twas brillig and the slithy toves
2:Did gyre and gimble in the wabe:
8:Beware the Jubjub bird and shun
9:The frumious Bandersnatch."
1:'Twas brillig and the slithy toves
2:Did gyre and gimble in the wabe:
8:Beware the Jubjub bird and shun
9:The frumious Bandersnatch."
plus the lines containing "and" from any other .txt files you have anywhere in your file system. How would you modify this program to show the name of the file in which the match was found?

Compound if-expression

The above if-expression has a second clause that beigin with "else if".

If the condition of the first clause is true, the first clause is run and the second clause is ignored.

If the condition of the first clause is false, and the condition of the second clause is true, the second clause is run.

If the condition of the first clause is false, and the condition of the second clause is false, none of the if-expression is run.

When findall is called, the variable dirfile will contain an instance of the File class that represents either a directory or a file. The first clause of the if-expression is run if the object in dirfile represents a directory. If it does not, you know that the object represents a file. In that case, the condition of the second clause test if it is a .txt file. If it is, the second clause performs the string search operation that you have been doing all along. If dirfile is neither a directory nor a .txt file, then findall does nothing.

A danger in recursion is that with each function call, the function calls itself again, creating a never-ending loop. In this example, the findall function will stop calling itself when it can find no more directories, which has to happen eventually.

Note: Under Windows, a directory tree must be finite, and therefore the above routine is safe. Under Unix, on the other hand, symbolic links can be used to create cycles -- effectively infinite directory trees. In a later chapter (*** link needed), we will see how to finitely walk such "infinite" cyclic structures.

 
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
ERights Home elang / intro 
Back to: Starting E & Elmer On to: Standalone E Programs
Download    FAQ    API    Mail Archive    Donate

report bug (including invalid html)

Golden Key Campaign Blue Ribbon Campaign