One way of describing every application in existence might be as a sequence of steps, some of them optional, and with flexible definitions:
- Get access to a data store.
- Read data.
- Write data.
- Shut down.
[aside]Just in case “shutting down” isn’t self-evident: it’s the state of achieving program termination. I can’t believe I just typed that in.[/aside]
The “data store” can be anything: a database, a service, even a user. Reading data might be as simple as pulling arguments from the command line, or reading sets of data from the data store. Writing data might consist of sending a PDF to an email address, or updating records in a database, or creating a new database altogether. Shutting down should be self-evident.
Some might say a measure of simplicity can be determined by examining how difficult those four operations are.
I’d like to present examples of that lifecycle for external client applications connecting to GigaSpaces XAP. This is a very limited scenario compared to the set of actual product capabilities, but it’s also likely to be the first architecture new users will investigate, since it mirrors the traditional client/server architecture so many people are familiar with and suffer from.
Getting Access
There are two primary ways to get access to a GigaSpaces data grid: programmatic access, and via Spring. Neither is difficult, and it’s more common to use Spring to establish communication than it is to programmatically configure a connection, but we’ll do it programmatically anyway.
There are two interfaces to consider: the IJSpace and GigaSpace.
Nobody actually cares about JavaSpaces any more. This is unfortunate, but it’s reality; Sun didn’t do a very good job of explaining why JINI and JavaSpaces were both very useful. As a result, neither term is beneficial.
The IJSpace is an interface that roughly maps to the JavaSpace interface. It’s mostly meant as an underlying technology for the far more useful GigaSpace interface, which allows users to use POJOs and document models (and flexible queries, and task execution, and … lots more) with the data grid.
You’ll normally interact with the IJSpace once in an application: it’s used to create the GigaSpace reference. (If you’re using the Spring integration, you probably won’t see IJSpace at all.)
[aside]POJO stands for “Plain Old Java Object,” or an object with simple mutators and accessors. Mutators and accessors are also known as “setters” and “getters,” respectively; I don’t care for the latter terms.[/aside]
The GigaSpace interface is designed to give you full write, read, update, query, and delete capabilities for the data grid. It allows two models of data, as mentioned: standard POJOs and documents. (We’re not going into the document API in this article. Plus, NoSQL has opened the world’s eyes to a few more models of data, but … out of scope.)
A POJO can contain a document, and of course documents can contain POJOs; one of the neat things GigaSpaces can do is build POJOs from documents (and vice versa) and query nested collections using either the POJO structure or the document API.
[aside]This is part of the basis for GigaSpaces’ “Same data, Any API” mantra, where you can access any piece of data contained in the data grid through any API that XAP makes available.[/aside]
The goal of step one in our application lifecycle is to give us a GigaSpace reference we can use for the rest of the application. There are some other artifacts of this process that get used later, too, primarily for “clean shutdown” (which is in quotes because it’s not a requirement.)
To get an IJSpace, the simplest mechanism (and the most common) will be through a URLSpaceConfigurer. However, before any of this will work, we need to start up a datagrid.
Starting the Data Grid
You’ll need a copy of GigaSpaces XAP, of course. It requires registration, but is a free download; get the Premium evaluation (which enables the entire product feature set); we’re going to use capabilities present all the way down to the community edition.
Unzip the installation locally; for the sake of example, we’ll use the UNIX notation for commands and variable substitutions, so let’s call the directory in which XAP is unzipped $GSHOME.
In order to get something to which our simple application can connect, we’ll have to start a container and then deploy a data grid into that container, which is a sequence of two commands. One of the commands doesn’t terminate (i.e., it’s not innately a daemon or service) so we’ll want two shells in order to run both commands.
The command to start the container:
cd $GSHOME/bin && ./gs-agent.sh gsa.gsc 1 && cd –
The next step is to deploy a named data grid into that container. The command to do this is also fairly simple:
cd $GSHOME/bin && ./gs.sh deploy-space myGrid && cd –
[aside]Why “rather powerful?” Because it’s able to incorporate processing as well as data; plus, if we wanted to, we could set up a mirroring facility to add write-through and write-behind capabilities to our single-node datagrid, as well as backup scenarios, a web container, and more.[/aside]
At this point, you should have a ready-for-use single-node data grid – roughly the equivalent of a rather powerful external object database. This isn’t ideal – but it should be more than satisfactory for exploratory purposes, and that’s all we’re looking at in this article.
Acquiring a Connection
The code to connect to our new data grid is fairly short:
UrlSpaceConfigurer configurer=new UrlSpaceConfigurer("jini:/*/*/myGrid");
IJSpace ijSpace=configurer.create();
GigaSpace gigaspace=new GigaSpaceConfigurer(ijspace).create();
We’ll want to hang on to that configurer reference, because it actually starts up some background processes to make the connection more durable.
[aside]Is this connection like a JDBC Connection? … No. We’re actually establishing a multi-node access point to a cluster; it just so happens that our “cluster” in this case has only one node. This code will reestablish connectivity to the data grid transparently should any issues arise, for example.[/aside]
We’re now able to use the gigaspace reference to interact with our data grid with POJO- or document-based data models.
Reading data
There are two normal ways one reads data from a GigaSpace with this specific API: a SQLQuery (so named because it’s similar to SQL, not because it uses SQL itself) and a query-by-example facility.
[aside]For true SQL usage, you’d use JDBC to connect to GigaSpaces.[/aside]
Query-by-example (“QBE”) is simpler, so let’s look at that first – but even before then, we need to talk about object construction for the data grid, because our data types are affected by the presence of QBE.
Building our POJOs, A Simplistic Explanation That Will Still Save You Some Grief
In a data grid, every attribute of an object can have two states: initialized or uninitialized. What these mean depends entirely on the object’s intended use.
If the object is actually representative of data, each attribute is literally represented. A string value containing a string is initialized data, and will be given to you as is; if, however, it’s left at null, it means that this data item has not had that attribute populated.
If the object is a query template, however, the meanings change slightly. A populated value is used as a template (i.e., applied as a filter such that any object returned will have a matching attribute value). An uninitialized value – i.e., null – is used as a wildcard.
The implication here is that every attribute in a data-grid-aware object should be nullable – i.e., an Object and not a primitive. You can work around this, but there are issues – namely, that there are no templated wildcards without specific configuration.
Using Query By Example
In QBE, you construct an object that looks like an object you want to retrieve. For example, if we have a Person object:
public class Person {
/* We assume mutators and accessors exist for name and state */
String name;
String state;
}
Then creating the template as in the following code will match any Person who lives in Arizona.
Person template=new Person();
template.setState("AZ");
That leads us to a set of methods we can use to read data. At first glance, this is a dizzying array of functions, but they’re really pretty easy to suss out with some basic rules in place.
[aside]Blocking reads are a great way to introduce a messaging paradigm… through a data model. This is a little odd, but it’s acceptable mostly because a data grid is not simply a data store – messaging is a feature that comes with the territory. It’s just strange for people who are used to data stores being somewhat static.[/aside]
First off, reads have cardinality: you can read one object, one or zero objects, or many objects. Second, reads affect the durability of the object(s) being read: the read can be destructive, in other words. Thirdly, the read can block – i.e., it can wait until a timeout occurs, or until a matching object is written into the data grid.
In the context of the GigaSpace interface, destructive queries are called takes. Nondestructive queries are reads. The specific method names indicate the blocking and cardinality of the queries.
To nondestructively read one matching object using our template, the simplest method is readIfExists():
Person personInAZ=gigaspace.readIfExists(template);
If we wanted to read a set of matching objects, we’d use readMultiple(), and pass in the maximum number of responses we expected. This example will return all matching objects:
Person[] peopleInAZ=gigaspace.readMultiple(template, Integer.MAX_VALUE);
If we wanted to remove the first five records matching the template from the data grid, we’d use the takeMultiple() method instead:
// this code makes little sense, really... People[] peopleInAZ=gigaspace.takeMultiple(template, 5);
Note that we can’t necessarily guarantee which objects these will be! It’s normally indeterminate, so if you relied on order, you’d need to use a different form of query for this kind of data set (i.e., SQLQuery, which can order results, and which we’ll get to soon.)
If you wanted to wait until a matching object was written into the data grid, you’d use a simple read() with a timeout, which will return null if the timeout occurs before a match is written into the space:
Person personInAZ=gigaspace.read(template, 1000);
We haven’t addressed transactions, nor will we get into them in this article, but one thing should be pointed out: if we were using transactions, the state modifications would be blocked until the objects were released from the transactions. We couldn’t destructively read an object that was being held by a different transaction, for example. This is a very powerful concept that most data grids and NoSQL products don’t handle well. However, like so many other things, it’s out of scope for this article, so let’s move on.
The next query facility is the SQLQuery. With this, you can actually specify parameterized queries using text. The read and take methods accept SQLQuery objects just as they do template objects, so the use of the SQLQuery in your reads and takes is the same as the use of a template object.
The declaration of a SQLQuery matching our template would look like these two declarations:
SQLQuery<Person> firstQuery=new SQLQuery<Person>
(Person.class, "state = 'AZ'");
// more flexible form:
SQLQuery<Person> parameterizedQuery=
new SQLQuery<Person>(Person.class,"state='?'");
parameterizedQuery.setParameter(1, "AZ");
Now we can use readIfExists() if we like:
Person personInAZ=gigaspace.readIfExists(parameterizedQuery);
It’s all well and good that we’ve looked at some simple ways to read data – but we’ve not talked about getting the data into the data grid in the first place yet. Time to fix that.
Writing Data
Writing data into a data grid is far simpler than reading is, because writes have fewer options. (Well, fewer options that we’re covering in this article; there’re actually a lot of things you can do with writes.)
[aside]Fewer options is actually a good thing for some contexts. Lots of options introduces the “Tyranny of Choice,” where it’s hard to decide between possible solutions.[/aside]
A write in a data grid has an associated lease time. After the lease expires, the object is no longer accessible (unless you have an event listener watching for lease expiration, in which case you have direct access to the object.) With leases, a data grid can act like a cache with expiring entries; if the lease time is set to “Lease.FOREVER,” the object does not expire (although it can be removed from the data grid with a destructive read). Lease.FOREVER is the default lease time for writes, although the default lease time can be configured for each GigaSpace object reference.
The simplest write might look like this:
Person person=new Person();
person.setName("John Q. Public");
person.setState("AZ");
// relies on default lease time, normally Lease.FOREVER
gigaspace.write(person);
If we wanted the person’s reference to expire, we could assign a lease time:
gigaspace.write(person, 60*60*1000); // one hour
The simplest form of an update might look like this:
Person template=new Person();
template.setName("John Q. Public");
Person person=gigaspace.take(template);
if(person!=null) {
person.setState("CA");
gigaspace.write(person);
}
[aside]Is this the right way to update? Well… not always. (That noise you hear? People internal to GigaSpaces are shouting “No!“) But this is good enough for the general case, and needs little explanation.[/aside]
This will move John Q. Public to California if (and only if) he’s already in the data grid. If we want him to be in California whether he’s already in the grid or not, our code might look a little different:
Person template=new Person();
template.setName("John Q. Public");
Person person=gigaspace.take(template);
if(person!=null) {
person=template; // we can re-use the original template reference
}
person.setState("CA");
gigaspace.write(person);
Again: we’re not addressing unique identification of objects (although in a real system we would), nor are we addressing an update-in-place. It’s possible (and not difficult), but out of scope for this article.
We’re Through Playing Now!
[aside]Do I miss TV’s “Hee Haw” variety show? You’d think so, wouldn’t you? … no, I don’t. But this phrase always cracks me up. “We’re through playing now!,” Minnie Pearl would say after, well, finishing a song. “Really?,” the youthful me would ask. “You’re really done now? I couldn’t tell.”[/aside]
So our prospective application has connected to a data grid, and implemented reads and writes. The last thing left to do is exit, which means terminating all the little maintenance threads that we would appreciate if our environment were not reliable.
To do this, we return all the way back to our configuration objects, used to establish the connection in the first place:
configurer.destroy();
We can, of course, just exit the application and shutdown hooks will take care of this for us.
Conclusion
Hopefully, you’ve seen how simple accessing data in a GigaSpaces XAP data grid can be. While we’ve not covered any of the nuances of interactions with the space, we have actually covered enough to address most operations; the nuances are important but don’t necessarily change anything we’ve seen here.
After this information, you’d potentially want to look into object metadata (annotations that address indexing options as well as transient data and unique identifiers), configuration of the GigaSpace through Spring (i.e., OpenSpaces), transactions, and application architecture (including the concept of the GigaSpace Processing Units). Other topics would include mirroring and initial load of data, messaging architectures, routing of data (part of the data model), and process distribution.

{ 0 comments… add one now }