Log in

A Text Editor Feature: Syntax Tree Walk - Emacs [entries|archive|friends|userinfo]

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

A Text Editor Feature: Syntax Tree Walk [Sep. 5th, 2006|04:08 pm]


A Text Editor Feature: Syntax Tree Walk

Xah Lee, 2006-09-05

I want to write a emacs lisp function such that, when run, highlight a region between the nearest left and right delimiters. Delimiters are any of parenthesis, square brackets, or double quotes. When the function is run again, it extends the selection to the outer enclosing delimiters, and so on.

So, in this way, a user can repeatedly press a keyboard shortcut and extend the selection.

This is feature of BBEdit/TextWrangler on the Mac, which extend selection to the nearest outer parenthesis. This is also a feature of the Mathematica editor, which actually extend selection to the nearest syntactical unit in the language, not just paired delimiters. In short, the selection extends according to the language's syntax tree.

What i wanted this for is mostly in editing HTML/XML, where one press can select the content, another press will include the enclosing tags, another press extends the selection to the next outer content, and another press include that tags too, and so on.

I think this would be a great feature for any language, where the a keypress will highlight more syntactical units in any language's mode. For example, suppose in C-like language:

function f (arg1, arg2) {

if the cursor is at arg1, then first press will highlight the word “arg1”, another press will highlight “arg1, arg2” of the args, another press includes the parens “(arg1, arg2)” , another press will include the whole function. function f (arg1, arg2) { line1; line2;}. If the cursor is at line1, then it selects that word in the line, then the line, then the whole function def body, then including {}, then the whole function... etc in many languages.

For a language with nested syntax, suppose we have this XML example:

  <title>Gulliver's Travels</title> 
  <summary>Annotated a chapter of Gulliver's Travels</summary> 
  <link rel="alternate" href="../p/Gullivers_Travels/gt3ch05.html"/> 

If the cursor is inside a tag's enclosing content, say, on the letter T in the string “Gulliver's Travels” inside the ” tags itself.

In summary, this highlighting feature is a syntax-tree walker. Each invocation will go up on the syntax tree. And, this is built in in the powerful integrated editor in Mathematica↗ .
Emacs and the Lisp Expression

For lisp language, where the language syntax is just nested parentheses, this facility is almost trivial to code. In fact, it is built-in in the emacs editor.

In Emacs, when editing lisp code, the following commands are available:

shortcut command name meaning
C-M-f forward-sexp → move to the next sibling
C-M-b backward-sexp → move to the previous sibling
C-M-SPC mark-sexp → same as forward-sexp but selecting the text
C-M-U backward-up-list → go up a node in the tree

These are nice features but they fail to be robust. The problems are: (1) These only work for lisp's nested parentheses, but not any other regularly nested expression (such as XML). (And, of course, it doesn't work for any other languages that do not use regularly nested syntax). (2) These do not exactly do syntax-tree-walking, because the implementation only does the simplest textual scan for tell-tale characters. For example, in the following code, place your cursor on the first tilda, then press C-M-SPC twice.

(global-set-key [kp-8] (lambda () (interactive) (find-file "~/web/emacs/unicode.txt")))
(global-set-key [kp-8] (lambda () (interactive) (find-file "~/web/emacs/unicode.txt")))

The first C-M-SPC will select all the text between the double quote (colored pink), which is a expected behavior. Another press should highlight all text inside the parenthesis, but instead, it will extend outside of the parenthesized boundary into the next innard of a unrelated code. (colored yellow)

XML also has a fairly very regular recursive syntax. In Emac's XML mode, forward-sexp, backward-sexp, mark-sexp are available, but not the most important backward-up-list. And, the forward-sexp etc do NOT behave according to the syntax-tree, but merely jumping between some prominent characters such as “<>"/”. After a few invocation, it will therefore result in highlighted region that is not a syntactical unit.

For some explanation about the syntax tree of a language, see Wikipedia: http://en.wikipedia.org/wiki/Parse_tree

This article is archived at: