Skip to content

non-null in static languages.

Very technical programming post about having a ‘no value’ concept in statically typed languages. Specifically: the difficulty of representing this in a static typing system.

Quick intro: Right now most static languages either have a concept called ‘null’, which means no value, and every object reference can point to null, such as Java or C#, -or-, there’s no ‘null’, and instead you can use a so-called ‘Maybe’ type to represent an optional value. Haskell does this. From here on out I’m assuming a Java-model static language (Fan and C# are similar enough to count).

Recently there’s been some push to add the possibility of null to the typing system itself, for Java. Stephen Colebourne reports that Fan recently added null to its typing system. However, his post is severely lacking in actual technical detail. The way he writes it, Fan’s support for null is fundamentally incomplete. Unfortunately, most people I talk to about this think adding nullity is a matter of tossing in a suffix-! to indicate definitely-not-null, or for the default is not-null fans (Like Fan, the language), suffix-? to indicate that null is allowed. This isn’t sufficient.

The notion that there are only two different states (allows-null and never-null) is wrong. Think of java generics; List<Number> foo = new ArrayList<Integer>(); is not legal, even though Number foo = new Integer(5); is legal. In generics, we have a convoluted but necessary trifecta of ways to say that your type contains Numbers:

List<Number> t; //allows reading Numbers out and writing Numbers in. List<? extends Number> t; //allows reading Numbers out, but no writing. List<? super Number> t; //allows writing Numbers in, but reading gives you 'Object'.

In the above, the most flexible option (the first, where we can both read and write) is also the least accepting: Only a List<Number> would do; you cannot give either a List<Integer> or a List<Object> legally, whereas in the second and third option, where we restrict ourselves to reading or writing, we can accept more.

We need the same trifecta for nullity:

List<String!> t; //allows reading non-null out. List<String?> t; //allow writing null in. List<String> t; //neither, but more accepting.

In the above example, those three are distinct. Specifically, While you can obviously assign a String! to a String?, as in: String! f = "foo"; String? t = f; //legal, obviously, you can NOT assign a List<String!> to a List<String?>. Here’s why: If it would be legal, you can sneakily add nulls to a non-null list. In the next snippet, we’ll assume for a moment that you could assign List<String!> to a List<String?>

List<String!> t = new ArrayList<String!>(); List<String?> f = t; f.add(null); String! s = t.get(0); //this returns null. WHOOPS!

This is entirely analogous to why a List<Number> does not allow you to assign a List<Integer> to it; you can secretly add a Double to a List<Integer>, which is bad.

So, List<String!> cannot be assigned to a List<String?> and a List<String?> is not assignable to a List<String!>. Okay, but, what do we do if we want to write a method that should accept either form? The need to do that is perfectly legit: If we only read and null check, or we only write non-nulls, or a combo of those two, then we really don’t care about the nullity of the incoming parameter. It would be ridiculous if it was impossible to convey this. And yet I don’t see how you’d do this in Fan.

For example, let’s say we have a method that sets the values for a row in a GUI table. The table has a sparse mode, where a column entry isn’t rendered and its space is provided to the cell to its left. ‘null’ is used to indicate this. However, obviously, lots of tables will be rendered from data out of, say, a database, where this feature just isn’t needed. Worse, if the data is also used elsewhere, it would be perfectly legit for the data to be handed to the code that renders the GUI in List<String!> form. To now pass this to the method that sets the row data, the only option is an unsafe cast and a @SuppressWarnings annotation, or copy the list, just to satisfy the type system. Eeeeugh. You need a way to say that you don’t care about the nullity type of a generics parameter.

The IDEA java editor has support for a @NonNull annotation, but they cop out entirely and don’t support it in generics bounds, which, in my opinion, means its useless.

You also need this trifecta (never-null, definitely-allows-null, either way) for generics bounds; for example:

List<?? super Integer> t; //You can add null to this. List<? super Integer> t; //Can't add null, returns Object? on read. List<?! super Integer> t; //Can't add null, returns Object! on read.

List&lt;?? extends Number&gt; t; //Can't write, returns Number?
List&lt;? extends Number&gt; t; //same as above
List&lt;?! extends Number&gt; t; //Can't write, returns Number!

Etcetera, etcetera. Incidentally, <? extends Foo> and <?? extends Foo> and ‘Foo’ and ‘Foo?’ are the only two in the entire lineup that are synonyms. You can construct a table of assignment compatibilities (can you assign a List<Integer!> to a List<?? extends Number?> – yes, you can), but that process is already complicated due to generics. Adding nullity to it makes it even more complicated.

Consider this a soft vote for the Maybe principle, however, because there’s so much legacy code out there, going that route just isn’t feasible. I’m still in favour of adding type-checked nullity to java, just like I think generics was a great idea even with the complexities introduced. However, make no mistake about it: It’s a complicated issue. And Stephen’s explanatory page about how Fan does it seems like they didn’t get it right.

ADDENDUM: The fourth nullity state.

Java language changes, or for that matter, any programming language change, does not live in a vacuum. There’s old ‘legacy’ code to consider, written before the feature existed. New code should be capable of interoperating with legacy code relatively painlessly, otherwise no project can move on to any of the new features without completely overhauling every last snippet of code they use. Thus, if nullity-in-the-type-system is to be adopted, it needs to interoperate with pre-nullity code. How do we do that?

Let’s look at generics, which is the closest relative. In generics, there’s a concept called a ‘raw’ type. In source, raw types are always easy to find: Its the ones without ANY generics bound, not even the < and > symbols. Java handles raw types by letting everything go (you can assign anything to a raw type, and a raw type can be assigned to anything; List<String> = methodThatReturnsARawList(); is legal). However, anytime you do so, you get a warning that basically says: Okay, if you say so, but the type system doesn’t take any responsibility for the correctness of your code.

A nullity proposal really should work the same way. If I KNOW a method will return a List of things that never contains null, and the list itself is also never null, but it was written before the addition, then, it would be nice if I could just assign the result to a List!<String!> and get a warning which I can then suppress, instead of a flat out refusal, or, also suboptimal, an unsafe cast, which would later have to be removed when the library I’m using gets updated.

Unfortunately, with our three modifiers (definitely not null, definitely allows null, and don’t know), we’re out of luck; unlike generics, the way it used to be written is also valid in the new way (specifically, in the examples above, it meant ‘don’t know’). We need a ‘raw type’ for nullity, which is distinct from the most null-accepting type. Here’s the difference:

a method that takes a List?<?? super Number> – explicitly written just like that, will simply not accept a List<Number!> as input. (after all, the method could add nulls to this list due to the ??). If you try, you get a compiler error. The method itself is allowed to write in nulls, and when reading, gets Objects out, that could be null.

On the other hand, a method that is legacy and has as signature: List<? super Number>, has the same behaviour: Writing nulls into the list is allowed, and when reading, you get Objects back which might be null. However, it isn’t the same as List?<?? super Number> for the same reason generics ‘raw’ types aren’t quite the same as any specific generics bound: In the legacy case, you CAN give a List<Number!> to the legacy method that accepts a List<? super Number> – however, because the type system does not know what the legacy method does, it will give you a nullity warning message that simply states: Okay, if you say so, but I cannot guarantee that this legacy method you’re calling will never write null into your list of Number!s.

So, we now have a 4th nullity state: legacy. That makes 4 states:

Definitely-allows-null, Definitely-not-null, Don’t care, and Legacy. How do we separate these states?

Here’s a modest proposal:

All new files only get parsed as java7 syntax if they start with “source 1.7;”. Then, never-null is the default, ‘*’ is ‘Don’t care’, ‘?’ is ‘definitely allows null’, and its not possible to actually create a legacy type; you can just interoperate with them. Without the source keyword, all your types are legacy-null. ! is no longer used except to promote a generics parameter to non-null. An example for that last one: Let’s say list has a method that returns the value if it isn’t null, and a default if it is, where the default may itself not be null, you could write:

public T! getIfNotNull(int index, T! default) { T x = get(index); return x == null ? default : x; }

There’s a bit of weirdness in the notion that a generics name (such as ‘T’) carries its null-nature with it, whereas a plain type, like, say, ‘String’, doesn’t. Therefore, you need both ? and ! to promote/demote whatever it was to the definitely-not-null or definitely-allows-null variety, with no modifier meaning: Whatever nullity state the type name was bound to.

In other words, if you go: new ArrayList<String!>, then The ‘T’ in the ArrayList class source is bound to ‘String!’, using ‘T!’ would also be ‘String!’, and using ‘T?’ would be ‘String?’. ‘T’ by itself works a bit like ‘don’t care’ – where relevant you must null check when reading, but you must not write in nulls when writing, because you don’t know if null is allowed or not.

{ 16 } Comments

  1. James E. Ervin | 2008/10/24 at 02:15 | Permalink

    Your post makes my head hurt and reminds me why I think of Java generics as a necessary evil. Your post in fact is more a testament to why generics should be left alone and not changed.

    I like the way Groovy deals with null. Imagine a var that should be of type A with a method called doA, in Groovy it is possible to specify:

    A a = null;
    a?.doA(); // doA() is invoked only if not null,
    // if null, becomes a no-op

    In fact I am going to make an appeal to leave Java the language largely alone. Java 1.5 should have been like Java3 or something, it should have been a new language. *Sigh* I wonder if, even though Java 1.4 is supposed to go away, if it ever will. I hate writing in it, but I have to admit the language is simple compared to Java 5.0+. I mean I used to understand java.lang.Class, not anymore.

    How about this criteria for future enhancements to the core Java langauge? No more changes to Java if it makes the java.lang.Class API more obtuse!

  2. Stephen Colebourne | 2008/10/24 at 02:23 | Permalink

    See Brian’s comment on the original post for why null-checking works in Fan. (Basically, Fan goes for 80% compiler checking and 20% runtime checking, avoiding complications like those suggested above)

  3. Rick Minerich | 2008/10/24 at 15:49 | Permalink

    I’m not sure I completely understand what makes the legacy case different. Is it just that the language is autoconverting types on the fly for the sake of interoperability?

  4. rzwitserloot | 2008/10/24 at 15:56 | Permalink

    Rick: Just like generics, there’s no actual conversion going on, but, yeah, the legacy type merely gives you a warning if you try to stuff an incompatible type in there (say, you pass a List into a method that accepts a List. That legacy null is functionally equivalent to List, and therefore passing in a List is no good, but, in the case of legacy null, you don’t get an error; you merely get a warning.

    The warning states something along the lines of: I hope this method doesn’t add nulls to your list, because I won’t be checking that for you; I’m going by your word that this will be fine.

    This is pretty much exactly how generics raw types work: You can pass a List into a method that takes a raw List, but the compiler warns you that it won’t check if this method actually only adds strings to the list, and not something else.

  5. Rick Minerich | 2008/10/24 at 16:10 | Permalink

    I’m not sure warnings are a good solution to any problem though. I work on a very large project and we have a large number of warnings generated. Many came from our movement into .NET 2.0 from 1.1. It would be so easy for something like that to go overlooked. I know on the other side of the fence right now (Microsoft and Sun) but I previously worked on a large Java project and things didn’t seem much better.

    I’m just finishing up a response post to yours on my blog. It should be up soon.

  6. Stephen Colebourne | 2008/10/24 at 16:47 | Permalink

    To be honest, you seem to be demonstrating very well why using the type system to *precisely* control generic or null behaviour is a bad idea. Its simply way to complicated to get your head around. This is a problem with much language research in general – it aims to provide precise, provable solutions rather than simple, easy to use, pragmatic ones.

  7. Rick Minerich | 2008/10/24 at 19:21 | Permalink

    I agree with you 100% Stephen. I feel too much effort is put towards solving little problems at the cost of having lots of diverse and complicated syntax that many programmers do not completely understand.

    Incidentally, I have a post up on my blog in response to this one about that exact topic:

    http://www.atalasoft.com/cs/blogs/rickm/archive/2008/10/24/much-ado-about-null-types.aspx

  8. Tony Morris | 2008/10/24 at 22:24 | Permalink

    “This is a problem with much language research in general”

    It’s not really. Research moved well beyond this years ago. Today we have algebraic data types, which generalise the whole problem. It is known that uncontrolled side-effects and nominal subtyping causes problems and thus, invariance in type parameters as described in this article. In fact, I have used closed sum types in Java for long time, albeit the verbosity and clumsiness (the alternatives are worse).

    All this was alleviated a long time ago. It is only that the dumbed-down languages do not market this fact. Give our researchers and past heroes the credit they deserve ;) This is only an interesting problem in the context of uninteresting programming languages.

  9. rzwitserloot | 2008/10/24 at 22:32 | Permalink

    On the contrary, guys. This isn’t complicated at all, -from the users perspective-. This is just like generics: As long as you don’t write the libraries, it’s simple. And if you DO write the libraries, it has to be complicated, because it just is. If ‘extends’ and ‘super’ never existed, then loads of useful methods, such as List.addAll, simply wouldn’t do what you wanted, or, they wouldn’t generate as many type errors as it currently does, for no good reason other than to make life slightly easier for those not smart enough to grok the full extent of co/contravariance in the first place. Those people should probably not be writing their libraries with generics and stick to the old everything-is-Object based model.

    This is no different. Yes, the above missive seems very complicated, but once implemented, it’s actually EASIER for the users of, say, ArrayList. Because with all of the above implemented, ArrayList would just plain do the right thing. *writing* ArrayList would be slightly harder, but if you already understand generics (and, face it, that isn’t going away), then getting your head around this is easy.

    With typing systems, you either go all the way, or you ditch it entirely and go Python/Ruby. Being somewhere in the middle tends to give you the weakest of both.

  10. fogus | 2008/10/25 at 01:08 | Permalink

    I wrote a post on this recently, in short there are three elements to replace null: nil, nothing, and notset. I am trying to implement these ideas in my own hobby languages.

    http://blog.fogus.me/2008/10/15/nil-nothing-and-notset/

    -m

  11. rzwitserloot | 2008/10/25 at 01:35 | Permalink

    Multiple versions of null can have its uses, but, java has only one: null. Futzing with this is not backwards compatible, and thus a pipe dream.

  12. Stefan Zeiger | 2008/10/25 at 12:23 | Permalink

    I think you’re making this more complicated than it need be by treating nullability as some sort of additional type instead of integrating it into the existing type hierarchy. Let’s assume a default-nullable approach for backward compatibility. When you define a class T, you get a type T (nullable, the same as now) and a type T! (“T not null”). The !-types form the same hierarchy as their underlying nullable types but each !-type also extends its underlying type. This makes sense because any value that can be assigned to a variable of type T! is also valid for T (but not vice versa, because null is a valid value for all class types T but not for T!).

    Now the “allows reading non-null out” type from your examples becomes List<String!> and the “allow writing null in” type is simply List<String>. More interestingly, if you are prepared to read String or null you’d use List<? extends String> (to which you could pass values of types List<String> and List<String!>). If you want to put Strings but not null into a list, you’d use List<? super String!>.

    Your “neither, but more accepting” case can not be handled by the existing syntax but that problem is Java’s lack of type intersections which is not specific to nullable types. If you really want to support it for the nullable types, a minor extension of the generics syntax would be enough. You just need to be able to give both, an upper and a lower bound. The example then becomes List<? super String! extends String>. A more general approach would allow intersections like “Cloneable & Serializable”. To a variable of this type you could only assign values which are both, Cloneable and Serializable. The notation List<? super String! extends String> would then be a shortcut for List<? super String!> & List<? extends String>.

    I don’t think we should complicate things any further just to support raw types. All “pre-nullability” types stay nullable, just like they are now. If an API has not been updated to support non-nullable types, you have to use casts and @SuppressWarnings annotations.

  13. rzwitserloot | 2008/10/25 at 20:27 | Permalink

    The problem with that reasoning, Stefan, is that is actually MORE complicated in daily use. Consider an author who wishes to write a map-in-place method. So as not to confuse this discussion with closures, I’ll pick a random task. Let’s say string.toUpperCase(). In vanilla 1.5 java:

    public void allToUpperCase(List<String> list) {
    for ( int i = 0 ; i < list.size() ; i++ ) {
    String x = list.get(i);
    if ( x != null ) list.set(i, x.toUpperCase());
    }
    }

    note that this works for both lists containing null, as well as non-null lists, and that the state isn’t changed, and that the compiler can assert this is true (provided that the compiler knows that toUpperCase() can’t return null, but obviously its return type would be amended to state this).

    Now let’s try this in my proposed syntax, where ? is definitely null, nothing is definitely not null, and * is either way:

    public void allToUpperCase(List<String*> list) {
    for ( int i = 0 ; i < list.size() ; i++ ) {
    String? x = list.get(i);
    if ( x != null ) list.set(i, x.toUpperCase());
    }
    }

    Note the changes: Only two; a question mark and a star. Now lets try with your example. First, with our extra syntax proposal:

    public void allToUpperCase(List!<? extends String super String!>) {
    for ( int i = 0 ; i < list.size() ; i++ ) {
    String? x = list.get(i);
    if ( x != null ) list.set(i, x.toUpperCase());
    }
    }

    and without it:

    @SuppressWarnings(“generics”)
    public void allToUpperCase(List!<? extends String>) {
    List<String> dummy = (List<String>)list
    for ( int i = 0 ; i < list.size() ; i++ ) {
    String x = list.get(i);
    if ( x != null ) dummy.set(i, x.toUpperCase());
    }
    }

    I don’t see how either of your versions are simpler in any way than mine is. Try to analyse what kind of compiler errors you’d get when you make various mistakes. My example emits warnings/errors when there are bugs more often than yours (especially the second one with the SuppressWarnings), and the messages make more sense. List<String*> is so much easier to understand than List<? extends String super String!> – that is so ridiculous that soon you’d have people asking for the * notation just to avoid boilerplate!

  14. Stefan Zeiger | 2008/10/25 at 22:54 | Permalink

    allToUpperCase can be written as a generic method without any casts or @SuppressWarnings:

    public <T extends String super String!> void allToUpperCase(List!<T> list) {
    for ( int i = 0 ; i < list.size() ; i++ ) {
    T x = list.get(i);
    if ( x != null ) list.set(i, x.toUpperCase());
    }
    }

    Yes, it’s a few characters longer than your proposal but semantically it is simpler. And frankly, I am not too worried about methods like this one which is pretty much an anti-functional and anti-object-oriented descent into the dark ages of procedural programming. While such methods are sometimes useful, my guess is that even with nullability modeled through the type hierarchy, you will very rarely need generic types with both upper and lower bounds together.

  15. rzwitserloot | 2008/10/26 at 01:14 | Permalink

    I wrote a version with your system which is shorter and also works, and doesn’t need a ‘T’ but makes do with a ?. See how confusing this gets?

    Your version isn’t just a few characters larger, it is fundamentally far more difficult to understand. Don’t tell me you think Joe Schmoe Java coder can understand ‘T extends String super String!’ more readily than ‘String*’. Then you need to take a step back or ask a few friends.

    Also, have a browse through the java library. There are loads of methods that would work with either nullity. Being forced to use ‘extends’ or ‘super’ to convey this, especially considering the massive amount of java programmers who don’t -really- get generics, that’s not a good thing. ‘*’ is just easier to understand. The fact that its more flexible is just gravy.

  16. Stefan Zeiger | 2008/10/26 at 12:27 | Permalink

    It would indeed work without a type parameter since String#toUpperCase always returns a String! value. And I absolutely agree that type bounds are not easy to understand. But they are already here! Java programmers are already using them and have to understand them; not so much when writing application code but certainly for libraries and frameworks. Programmers who don’t understand generics well enough will simply continue writing methods which take a List<String> and mutate it. And if they want non-null Strings they’ll change the method to accept a List<String!> instead but not both.

    And please list some of those loads of methods that would work with either nullity where a type-bounds-based approach would be more complicated, because off the top of my head, I can’t think of any. Methods which take a collection parameter are usually already written with type bounds, so the only change would possibly be an additional “!” on the type bound. Parameters of concrete types would automatically work with non-null values. And with an additional “!” you could force them to work only with non-null values. (Actually, I’d prefer to use “?” for nullable types and make non-null the default but it’s easier to explain with “!” when switching back and forth between new and old code.)