Eelco Visser – Scope Graphs: A Fresh Look at Name Binding in Programming Languages

my name is professor I'm a professor at Delft University of Technology and today I'm going to talk about scope graphs a new approach to name by could have here – Robert writes and so suppose you have a design – language you have an idea for language and CF scribbled on the whiteboards and you have figured out the syntax and your semantics so here's not a new language this is Andrew a panel Steiger language but we're using it as an example so you you know what your language is going to look like now all you need is an implementation alright so what do you need to do well you need a couple of things you to build a couple things your parser type checker a compiler or interpreter and so that's the job ahead of you ok so you start with with parsing and well that's that's good because parsing we know how to do well right so you use context-free grammars to describe the syntax of your language that describes sort of the textual notation of your language but it also describes it gives you a way to get an abstract syntax for your for your language and a good tools for turning grammars into too late so if you use a language workbench like to fax then you type your syntax rules and outcomes and editor that it's index aware and issue syntax highlighting and all these all these things and it also gives you an automatic mapping from texts to parse trees to have sex and x-rays and there's more stuff that you can get from such a cynic definition you could get syntactic completions where your new users can discover the language by by control spacing and getting well proposals for possible ways to to write a program and of course you get syntax checking your editor and and so this is all very well understood all right so we have this ocean of separation of concerns index definition where we have a notion of a representation so we have policies or application experience that represents the results of parsing programs and in at representation is language independent right so we can talk about abstract syntax independent off of particular languages and then we have a set of we have we have a formal notation for achieving these things so we write down polish free grammars and maybe disambiguation rules and that gives us all we need to define this index over language but these notions give us a language for talking about syntax it's the concept that we can use to exchange information about this index of a programming language and just by defining well just using context-free grammars what we get for free is a whole bunch of language tooling that's our tooling that is that you don't have to write specifically for your language but tools can provide it for you so from a grammar you can generate generate a parser but you get a sex aware editor you get syntactic syntactic completion for matters and etc all right that's nice so we've written a grammar for a language and we're done we're an editor and if you want to know more about this then I'll talk at more length about exactly cineq definition in my summer school lecture on Thursday aciem all rights check we are a parser for a language now we just need to do things with all the other aspects of our language improvisation require name binding and well that isn't quite so easy as as the syntax part and a set of what I'm going to talk about today so so what is name body well name lining is the fact that we so our puzzle gives us three structures for programs but effects our programs are graphs because we use symbolic names to refer from one place of a program to another place or project but it's convenient in text but in order to semantically operate with these with this programs we need to resolve these his graph links right so this occurs as variables local variables in a functions or function calls that refer to function definitions or and we have to take care of scopes right so we may have deeply nested scopes and we may have to figure out in which scope a variables defined taking into a pouch shadowing but name resolution we also have to do for four types for type names right we can make type as fashions and refer to those for record fields and we may want to do have to do type dependent name resolution where we figure out a name points to by first figuring out what a type is for example here in his expression or general tax we would like to know what X is this is his name referring to what we in order to figure that out we first need to figure out what this expression refers to and and that has a name which refers to some declaration which has a type which we need to figure out what it points to and and then by that that we can figure out what the what record feels the the X points alright so that's the idea of of name lining and now the question is how do we define the name binding rules of our language or code otherwise what's the BNF of name lining but it would be nice if we could write down declaratively what's the name lining rules of a language are and from that generates a whole bunch of of tooling and that's what what this is about so so what we would like to achieve is for name binding similar as well for the air for synthetics namely a standard representation where we can conduct and represent the results of resolution and declarative rules that allows to write down what the name binding rules of our language are and then from that derive all kind kinds of language independence too late so to start with name resolution that says well from the use of a name what is the definition of the name and from that we can get code completion and refactorings and all kinds of other tools all right and so this talk is about scope class that is our proposal for a carbon representation to the ast s of name body so let's go graphs or the idea is simple we take a program or the ast of the program and we map that to a scope graph now whether the scope graph well a scope consists of scopes which are you see these circles which may R can be connected by edges scopes can have declarations so these are pointed to by from from scope and we can have references references our names at points into a scope in which they are references and I'm given a scope graph you want to do name resolution and a resolution is a matter of finding a path from a reference to a declaration so if you fall asleep now this is what you need to take away from this this talk but we'll go look into this into a bit more detail so we have developed a formal theory around this idea of scope grounds so there's a calculus of name resolution which tells us how to resolve names in in scope graph and it was especially a for which ability rules and we'll look at visibility later otherwise so how does this work let's start with the basics so we have a program and in these examples I'll be using subscripts for names that is not to make these names difference but all the X's here are the same X but at different positions in the in the program so that's to distinguish occurrences in ere program so what we want to do here is resolve a reference the reference x2 to the declaration x1 so in this particular language there's no definition before use and and these definitions are enclosed in a single global scope so we represent scopes by these circles and we define references by boxes with names that are pointed to from the scope in which they already clear and we define references by boxes that point to the scope in which they are a reference and then resolution is a matter of finding a path from a reference to a declaration with the same name following these basic steps so we can go from a reference to its scope from a scope to declaration and so that's it simple name resolution now let's look at more interesting cases for example lexical scoping so whole example program we have two declarations again but now we one of the declarations is a function and and so fun here is lambda so the y1 is the argument of the after function is not the name of the function and so if we see is we have a global scope and we have a nested scope which is the scope of the other function so let's make the scope graph again so we have a global scope with the two declarations x1 and z1 and a reference g2 and we have a nested scope as one with references x2n is easier right with the reference is x2 and y2 and a declaration for the lambda variable viola and now to express the effect that the scope as well as nested in scope as zero we add an edge from the child to the parent scope so now in order to result names we can use the same rules we used before so the local reference y2 we can resolve with the rules we saw before but in order to resolve x2 we need an extra step namely we can follow a path to a to a parents to a parent scope and from there to a declaration all right questions typically arises this point yes the time to ask this question but it's something that you talked before you talked about for instance you gave examples few names but perfume names you need to type the type LOL yes yes okay yes definitely yes all right there's another question but we'll come back to that question that you have to ask all right all right so lexical scoping is a method of building up change of scope and you can search from outer to inner scopes but tip but you cannot sort the declarations in inner scopes are not accessible from the outer scope right there they're encapsulated and now let's look and I have another typical phenomenon in name mining which is modules and imports so here we have two modules a and B and here's the scope graph structure for that for those modules right so the module names are declarations at the top level and each of the scope defines a scope for the declarations in the module right so we here we have a declaration x1 and in module a we have declaration z1 and and these declarations are not accessible from outside very encapsulated in the inner module there's no bad from scope a 0 to declaration ex-boxer they're encapsulated and but what we would like to do so modules in capsulize declarations but what we would like to do is make them available to some other modules on request yet what what imports are about so this import construct is well itself mentions the name of a module page reference itself but now in order to make the declarations of scope module a available in module B we need something extra and that is the notion of of imports and to model imports we have a notion of associated scope so with the with a name we can associate its cog so we associate the scope of of a module with its module name and a an import and an import is an arrow from scope to a to a reference and it's saying well we're going to import intentions we're going to import the declarations in the scope associated with the thing look at the rule but what we would like to do is resolve g2 to the declaration in module a and what we can we can go to the module its defining but then we get stuck but then using the rule for imports we can resolve we can resolve this so what we first need to do is schedule says if we have an import of our one a scope s and we can resolve our one two declaration our two and that has an Associated scope as our then there is a virtual edge between scopes s and s are right so send out if we resolve a – over here – its – the module name a1 then we get a virtual edge from the scope of module B to the scope of module a and it allows us to resolve the Z variable in in the module it is defined all right so the final pattern is qualified names so with input we said well we import all the declarations from a scope into another scope sometimes you want to be more precise and say we want to have a particular name from a scope and that is what the what what qualified names last right so here we saying module n defines some name and it can define lots of times we want to you a particular name so we have a static access to the two declaration as well a module m1 so how to model dance so rather than importing all of module n1 into module m1 what we do is make this s2 a reference in a you know anonymous scope that doesn't really correspond to a block in in the program in that scope we import the the module N and that allows us to do that well resolve the module end to its – it's a shared scope and then via the import rule import a particular name from from that module all right so these are the basic ideas of the resolution calculus it allows you to so we build up graphs that represent the name binding facts of programs and using these rules what we get is reach ability in that graph well so we we can go from references to two declarations and and these patterns allow us to model a wide range of binding patterns but how about ambiguity so let's see the question I was I was expecting so bother if there is two declarations that can be can be reached from a reference and two to deal with that we extend this calculus with the notion of visibility where we can disseminate between multiple possible paths for a reference and so it doesn't modify divisibility rules but it's it gives you a predicate on deciding between between paths and we'll look at examples of of dance so first off here's a here's a basic occurrence of a possible ambiguous scope graph so we have a scope with two declarations for an X at different occurrences and that means the reference X 3 now it's two possible paths right we can resolve to explore we can resolve it to X 2 and this resolution is ambiguous and what can we do about this else we could use the last definition indeed but actually that's a different pattern because in this in this table where we design or we've defined that is that our declaration in the same scope so they're the same we cannot really distinguish their order but they need a battery necessary well if you want to have definition before use you could do two different scopes and we'll see an example of that later basically the let pattern is a is an example of that where you say well now we can distinguish we can we we want to prefer the one that's close we'll see you long time but in this case we could say well this is just one not I mean this is the scope and it has two declarations of the same name so that is that is wrong still we want to be able to to have scope graphs that represent that kind of erroneous information because real programs do have these kinds of errors so we do so we do allow these things to occur but then we can decide well but is it wrong there's two data references there's two paths for these lessons and so there's an error that you can display a righty but sometimes actually this may not be an error so this is okay more better matching and there you can have multiple patterns that define the same variable and so a reference in the body of such a of such a case and refer to either one of those depending on the pattern match outcome and here you actually have a big ewis resolution in this kind of fashion so sometimes it may actually be make sense all right so let's look at more interesting cases so shadowing for example right so here we have a similar example as we saw before we have two declarations and and we have a lambda but now we have a method lambda and and let's see what is the issue here right so there's an there's a reference of X in the body of the inner inner London and there's a Declaration of X as the arguments of this function and but there's also a Declaration of X at the at the top level all right and we see that in the in the graph here so we have a reference at at the bottom and we have a declaration scope as one which corresponds to the to the lambda and we have a an X at the next declaration at the top level is called a zero all right well so applying the rules we've seen before there's a path from X 2 to X 3 and there's another path from X 2 to X 1 all right sorry there's two reachability paths and now except we could try to modify our reachability calculus to try to to exclude this case but the sample said well we leave this veg bility calculus alone and what we instead do is distinguish between the the path and and so we can define all kinds of policies for defining shadowing like situations and so what we want to do here is define a specificity ordering that compares paths and the specificity ordering here says that we want to prefer local declarations over declaration that we find via a third edge and and then we transitively close that and that means that the red path in this case is more specific than the blue path is not necessarily the case that the paths are shorteners we'll see in a bit so the direct path is more specific alright and that exactly corresponds to the notion we were expecting then we'd have to be the inner lambda shadows the outer the outer declaration all right another kind of shadowing so let's see what the situation is here so here's the skull graph it's a bit similar to the modules we saw before but so we have a module a module B module a defined Z as we saw before but now we also I have a declaration of G at the table over here and so we're interested in the resolution of G – what well so this is an example this is a question I asked at the exam and so what is z1 know what the z2 refer to and of course not by name you survive there's two possibilities right it can be z1 a z3 so we can see well is there a path to both of these and if so which one would we prefer or which one would you prefer ah that's true yeah okay all right so indeed there's two pads so one as we saw before via the module import of module a and the honor goes to the the parent and reaches there the right so so z3 is in lexical scope of the module so that's where we could reach it and the other is so now you could sell all you put argue about what what is the right what is the right thing to do and so we don't necessarily we don't build in this choice in our calculus but we allow the definition of a policy and again that's using specificity rules and we here really saying well we prefer imports prefer our preferred over parents if you import something that is better than if you get it from your lexical context because you're probably importing it specifically so that's what you want to achieve pony you can also make the counter-argument and said if you would prefer your lexical the thing you find in electrical context but in this case using this policy what we get is we prefer to wrap pad over the that's but we can also reverse the rule and Emily we get parents shadow imports or we could leave out this rule and then this situation would be ambiguous and that would mean we don't it's an error if something occurs both in the lexical context and in the imports so you can play with these rules to to tune the binding rules for your for your language I completely forgot to turn this time around so if you can keep track all right so here's another variance Oh so we said imports shadow parents but we can also say we have instead of imports we have includes let me even so what is this about this is about a situation we want to resolve z2 and we could either resolve it to something that's in our own scope or it's in a module that we import and so if we say that we prefer local declarations of our imports then we would prefer the resolution of G 2 2 Z 3 but if we will say well we haven't includes policy that means we don't specify the specificity ordering and that would mean we don't make a choice between these two situations and it would give us an error in the in our ID all right another interesting situation and the example is a bit contrived but so this is a variant of the de qualified name situation so we have a RiRi using this qualified name s2 and what we're saying is given give me the s from module n right but it happens in this case that module n doesn't have an S but there is an S in the lexical context of of n so there is a path that says well go from s fiery import of n2 its scope and then take the pen attached to the enclosing context and there find the definition of of s and so the examples a bit contrived because the dis cope that we're doing this in is the same scope as the electrical context of the module but but suppose you're importing so module from somewhere else in it it has a lexical context and you're trying to get a member of that module but it doesn't have one and so that means you get one from the next context is that's a good idea maybe not and so you what we would like to be able to model that in our in our policy again you may want several yes that's an excellent idea but let's suppose we want to forbid that kind of situation what we can do is define a wealth fondness predicate on pants that defines that pads can have a certain form and in this case we're saying is pants should have this shape so you go from a reference via a chain of parents edges and then possibly a chain of import edges to a to a module and there you find a decoration but what you cannot do is go to an imprudent import and then to a Apparel edge so that's four bits pairs of ah – ape and so it doesn't allow you to go to the lecture context of of modules ten minutes all right so and there's a couple of more of these of these these patterns and let me let me skip those well there's only one but the best idea now is our calculus for name resolution so we have which ability which allows us to find paths in a scope graph from references to declarations we have well fallen spirit codes which allows us to define a predicate on a regular expression on the labels of off pants and that allows you to forbid certain patterns or allow particular patterns and we had this notion of a specificity ordering on pads that allows us to choose between between paths and that allows us to define shadowing policies so that's the basic idea of Auto resolution calculus and so these were my requirements right so what I would what I would like is a a a similar declarative situation as we have syntax so a representation that allows us to reason about name resolution so that's what I've just shown scope graphs are this representation to represent the name binding facts of programs independent of the particular language and now we can we can define policies for visibility in inserts coverage the next step is we would like a language for describing these things similarly as we have commentary grammars for percentage X and for that we have designed a constraint generation language that maps at 16 X 3 is 2 constraints so what's the architecture we take a program we pause it get an ast from that ast we get we generate constraints and these constraints generation rules are language specific that's what the language designer expenses and then given a set of constraints we resolve these constraints and that gives us a solution and that operation is language independent so we can reuse this this architecture this for a range of of languages so let me explain how is work so so basically these conquer constraints correspond to the elements of scara's so we can create new scopes we can make edges between scopes that are labeled we can declare a declaration right so here we're saying X is a declaration in scope s in the namespace and so I haven't talked about namespaces but you might want to distinguish functions from variables and namespaces or allow you to do that your way around is a reference so here X is a reference in scope as in namespace N and then we have a resolution constraint so these constraints all are basically building up a SKA graph in the resolution process or given this reference X find me it's Dekker and D by finding a path in a scope graph that is visible cetera so these are the basic constraints and then we have constraints generation rules that say well now generate me to give me the constraints for this expression given this scope so let me illustrate you how it works so here we have a program we map it to its ast and what we're going to do is build up what we have a scope from from the context and what we're going to do is build up the scope graph given constrain generation rules over here write this constrain generation rule matches our our let binding structure right we have a lab binding with variable declaration and a body essence called s so if we're going to do is create a new scope as body we make an edge to its outer scope we declare X as a declaration in that body scope and then we recursively generates constraints for the free expression over there and so the initialization expression and note here that we are using the scope s to generate those constructs right because the encounter the variable X over here and for variables we have another constraint generation rule that matches on VARs and what it does is says well we generate a reference in scope as prime so that this scope as that we had There and it requires us to to resolve that reference so this place we see well this this X in this initialization expression what doesn't resolve to anything we know here because it's it's something but it's not it's free in this in this expression right now when we recursively apply that to the the body of the lands we we may go down and generate some more scopes we use the same variable rule which now generates a reference for X in endoscope which we then resolve to the reference in the in the let binding over it alright five minutes alright so how about dots well we can our constraint language also gives us time constraints so we can do is say well the type of a declaration is some type expression we have type equality subtype and their possible extensions and enemy we can define the the type constraints so basically it's the same constraints but now we associate types with expressions and with address with declarations and references and it allows us to resolve the types so this resolution he resolves the declaration of the variable x over here and and this constrains which feeds the type of that declaration which is then the type that we infer for the variable so in this way we can define a constraint generator that gives us the constraints for program that when solved do birds name and type analysis for for our program so now I have a more interesting example which is type dependent name resolution which is exactly Authority interesting case but I going to skip that in the interest of time but if you're really interested then crash my summer school lecture on Thursday and all showed it alright so what do we have we are an approach for declaratively defining the name binding of programming language that causes both over-representation scope graphs and a declarative language in which we can define by means of constrain generation rules how we map STS to scope and type effects and this gives us a language for talking about name binding in in programs so given these definitions now what we can do is generate all kinds of language independence tooling just works with with types and so we have implemented this approach in this context language workbench and so you write your content generation rules and what you get is not only an editor that syntax aware but also an editor that is type and name where you can do navigation it does name checking etcetera we applying it to a bunch of of languages so we slowly scaling up to more realistic things it's still a work in progress we also have an approach to connect this to semantics of programming languages more in a summer school lecture and so what's the status so we have a theory that this resolution calculus that gives us name binding and time constraints and a name resolution algorithm that it sounds with respect to the countless and we have this mapping to runtime memory layers we have a language for defining the name binding rules called enable to and we have tooling to go with this so we have a solar solver just now getting accepted lis fast well accepted Li slow I guess you should say and this is integrated in the perfect language workbench where you can use it and you get editors with name and type checking and navigation and so this has support what are the limitations it's a dimensional model so what we can do so the practice for name binding is turing-complete algorithms and compilers we cannot hope to do cover all of those algorithms so this really is kind of a normative model this is our this is what name binding is now the question is is that adequate but and that requires experimentation I think it does cover to same name by name I think that's a good thing just like using context-free grammars for your syntax if you saying model for your sector of your language so there's still room for future work so these scopes are really interesting it seems as a model for structural times but that requires operations on comparing scopes we're still struggling with generics but there's an idea to use dependant object type inspire that dependent object types there's seem to be a solution there we're working on type sounds of interpreters automatically and that is progressing well and has a bunch of things to do in getting tooling more performant doing incremental analysis etc so to conclude I presented Scott graphs to as a solution for for name binding what I hope it that this will provide us a common cross language understanding of name binding and a foundation for formalization and implementation of programming languages and I would be very interested in your ear feedback offline or otherwise and thank [Applause] I guess we're out of time so let me ask by giving you one example of dirty hacks that may be done in some language and asking whether this could be translated imagine that in a function if it declares a return type you want to resolve the identifier result to a synthetic variable of that type of the question is it easy approach and able to do yes yes that's fine you can just make up names they don't have to come from from the program yes that's it all right thank you [Applause]

Tags: , , , , , , , , , , , , , , , , , , , ,

There are no comments yet

Why not be the first

Leave a Reply

Your email address will not be published. Required fields are marked *