Mittwoch, 4. Mai 2016

Guide: How to store data in Java

Imagine creating your own application or game. When you first start out, you may keep and generate your data in code. As you progress you may find how inefficient and cumbersome this turns out to be and perhaps contemplate moving all that to external files.

How do you store the growing amount of application data? What formats do you use to store them?

There are endless methods to store and retrieve data for your project. Before reinventing the wheel by creating your own parser and file format (which is an enormous effort), one must contemplate what requirements your format needs to fulfill.

For example: Does my data need to be...
  • ...flexible and extendable?
  • ...compatible with different versions?
  • ...easily editable?
  • ...easily maintainable?
  • ...as compact as possible?

In Java, multiple ways of storing data for your projects exist. I want to explore some of the technology that exist to do that. A summary can be found below. Click on the format to jump to the details.



Format Syntax Legible Complexity
of saved Data
Extendability Libraries Potential Uses
CSV Simple Yes Plain Changes must be adopted manually none - export or import
data across
programs

.ini Format Simple Yes Tree with depth 1 Changes must be adopted manually ini4j Configuration files
XML Bloated Yes Tree Changes in classes can
be ignored (data won't
become incompatible)
JDOM
JBAX
- Configuration files
- Game data
JSON Clear Yes Tree Changes in classes can
be ignored (data won't
become incompatible)
json-simple - Configuration files
- Game data
Binary None No Tree JDK: Changes in classes may cause saved data to become unusable
Kryo: Changes in classes
can be logged, preserving
compability
JDK,
Kryo
- Networking
- Game data
- Temporary data
- Compressed binary
files

CSV

 

Comma Separated Values. This is probably the most basic format you can use (it's also one of the oldest, being around before personal computers even existed). CSV files can be easily written programmatically. No additional parsing needed. If you need to keep it as simple as possible (e.g. you only need to store a lot of Strings) you're good with CSV. It has the advantage of being editable with programs like Excel or OpenOffice, which can be very useful tools. However, the second your data uses a tree-like structure (objects in objects) you get dangerously close to creating your own parser, eventually ending up reinventing the wheel. Don't go there ever!
CSV extremely handy for tasks like storing configs or as export format - I've seen it being used for translation files - but other than that... try to stay away from it.

Back to table

Windows .ini file

 

Before starting I want to point out that this is the only file format I haven't gotten into much detail, but want to mention it regardless.
The .ini file format is a rather ancient remnant of the past, being in use ever since Windows XP and earlier. It strikes out due to its simplicity. Because .ini files allow you to pack your name-value pairs into sections or groups, they can be very handy even to this day. Another plus: you can read them and edit them with any text editor. These kind of files still won't allow you to store an object in an object, but are easier to use than CSV. They make great configuration files - as long as you keep the data you want to store simple. Keep in mind it's likely unsuited to store larger amounts of data. You're welcome to try it out and get the ini4j library implementation for Java.

Back to table

XML

 

Due to its nature, XML is a format that supports storing of objects in other objects. Its syntax is rather bloated which results in a rather large file size. XML is readable and easily editable, and that is why it is commonly used in a variety of applications, like storing configs and also more complex data itself for games and software alike. It is supported by many frameworks. For this reason, it is also being used to export files across different programs.
When I dove into XML I was using a custom JDOM serializer. That pretty much ended up being a nightmare because I was reinventing the wheel! Since you should probably not do the same mistake, you can use Serializer like JAXB to turn your data into XML files with greater ease.

Back to table
 

JSON

 

JSON is an acronym for "Javascript Object Notation". It is a readable format that supports complex object trees. Unlike XML it's syntax is compact, so the file size is small in comparison. It also comes with a large support across different platforms and is a very flexible format to store your data in, for configs and complex data alike. This is due to the fact some JSON serializers can be configured to ignore data in the file if it could not be found in the class it is attempting to serialize. If the serializer can't find the object's field in the file, it simply leaves it to the state you declared or initialized it in the class. This makes it a very flexible and pleasant to use format, since it works without making your data itself incompatible. Some game frameworks or engines (like LibGdx) also ship with a JSON Serializer. If you're not working with such a framework, you can use json-simple (version 1.1.1).

Back to table

Binary

 

Another method to store your data is to simply store it in binary, as 1's and 0's. The upside is, your data won't be readable by anyone (well, partially). The downside is, you always have to call your binary serializer from code to store your data.
When working with Java standard serialization, you will encounter problems deserializing your objects from a file once you've changed the class. The code will terminate telling you the object cannot be serialized (because you changed, added or removed a constructor, methods or field). This can be disastrous if you are recklessly making changes to a class, just to find out you just made hundreds of bytes of binary data incompatible. Good luck redoing all of this!

To avoid this, there are libraries like Kryo that allows you to 1) create customized serializers for each class and 2) add version control for each field you are storing. Point 2) may give you some control over adding compability, but you will be left with unused code fragments you can't remove. If you do remove them, Kryo will kry that an old, deprecated field in your code is missing (excuse the pun). This allows you for some control but is still not as flexible as JSON, I've found.

However, there are some neat things you can do with binary files, for example controlled binary serialization (only store the bits of data you need using your own definition), creating your own compressed data formats, and much much more. One important thing about binary is the fact it's used to transmit data over a network.

Back to table


Summary


The most flexible format I have used for saving large amounts of data that is easy to use and maintain is without doubt JSON, followed by controlled binary serialization (Kryo). With a decent serializer, XML can also be very powerful despite of its large file size and bloated syntax. CSV and .ini are rather simple formats and very handy for simple config files that don't change frequently during development. Which of these formats you will use eventually and for what purpose - you decide!

Keine Kommentare:

Kommentar veröffentlichen