Data APIs
Available since version 3.17.0
The Data API provides useful abstractions for working with data from various sources and converting it into some kind of format that can be more easily stored in AEM.
Spreadsheet
The Spreadsheet API com.adobe.acs.commons.data.Spreadsheet
is built on top of Apache POI and is designed to simplify the work of reading structured tabular data from an XLSX file.
Structure
In order to be processed correctly, an XLSX file must have the required data located in the first sheet of the workbook. Other sheets are ignored, and can be used for other functions such as lookup tables and so on.
In order to identify values for each row by a usable name, the first row (herein called the header row) identifies each column. If a column should have at most one value, then there should only be one column with that name. If a column could have many values, you don’t have to cram all of the values into one cell. You can create multiple columns which all have the same name in duplicate, and it will be processed as if it were one large column with multiple values.
It is recommended to use the first sheet to cleanly identify all required data without any scratch areas that might be misinterpreted as data. If additional scratch columns are needed, then make sure to leave the header row empty for those columns. Any column that does not have a name in the header row will be effectively skipped.
It is possible to instruct the Spreadsheet API that certain columns are required for processing purposes. If a list of required column names is provided, then a row will be discarded if it does not have a value for that column. IMPORTANT NOTE: If a required column is indicated but is not present in the header row of the file, effectively the whole file will be ignored. When in doubt, check your file for a correct header.
Column naming rules
Column names should ideally follow the same naming rules as JCR properties. Names should include only the following types of characters if you plan to map data to AEM properties using this API:
- Upper and Lower case letters (A-Z, a-z)
- Numbers (0-9)
- Plus, Minus, Underscore characters (+, -, _)
To simplify things, the API also has an option to convert property names automatically such that everything is made lower-case and any unsupported characters are converted into underscores. Some tools that work with data importing will likely turn this option on by default.
Column type hinting
By default a column is assumed to be a single-value String. It is possible add some hints in the header for that column to instruct the API to interpret the contents differently. Here are some examples:
Header column | Real name | Translation | Example values |
---|---|---|---|
myCol@int | myCol | Integer numbers between +/- 2.1 billion | -999 |
myCol2@integer | myCol2 | Same as above | 12345678 |
otherCol@long | otherCol | Long numbers betwen +/- 9.2 quintillion | 14309128340123 |
price@double | price | 64-bit floating point number | 10.1434 |
other@number | other | Same (also means “double”) | -0.3412 |
deadline@date | deadline | A representation of date and time † | Jan 1 1980 |
… | … | Also can use calendar, cal, or time | |
alive@boolean | alive | Boolean value (yes/no, true/false) ‡ | Y, True, T, 1 |
finished@bool | finished | Same as boolean | |
val@string | val | String value | Hello World! |
val@str | val | Also means string | Ooga Booga! |
val | val | Assumed to be string by default | Okay! |
† Date conversion works most reliably if Excel already has cells set up as dates to start with. Some attempt is made to convert non-date values into proper dates but only a few basic patterns are detected.
‡ Boolean conversion is pretty basic, using the following rules:
- Any value beginning with the letters X, T, Y (any case) or a non-zero number.
- Anything else is treated as false.
Note in 3.17.0+: Regardless of type hints provided, if you request the string value of any column, it will return the display representation of the cell as it appears in Excel. Meaning if it is formatted as a percentage you will get something like 100.00%, not 1.0, as the value of toString().
Multi-value columns
It is also possible to provide additional hinting that a value should be split into a list as well. To do this put square braces at the end of the type hint [] – if you want to specificy the separator place it inside the braces like this [:]
Header column | Real name | Translation | Example values |
---|---|---|---|
myCol@int[] | myCol | List of numbers separated by , | 1,100,-999 |
myCol2@int[:] | myCol2 | List of numbers separated by : | 1:2:3:4 |
cq:tags@[] | cq:tags | List of strings separated by , | tag:A/B,tag:B |
As stated above, columns having the same name are treated as one column with multiple values. In this case there is no need to worry about using separator characters because using separate excel columns is itself separating the different values already. As an edge case, it is a good idea to add a hint to the columns as a list using a separator character that is not in the text just to ensure it doesn’t try to split the values up. For example [~~] would tell it to use a double-tilde sequence as a separator which is unlikely to ever appear in natural text. You don’t have to do this for all columns, just the first one is sufficient.
Variant
Variant com.adobe.acs.commons.data.Variant
is an abstraction of a simple value that can be converted into an other simple type in most cases. This is the heart of the Spreadsheet API, but it is also incredibly useful on its own. The primary goal of a variant is to accept data from any source format and have the interpretation of that value provided at a later point.
A common example is that data from a file might be loaded as a set of strings. Later on, some of that data might require interpretation as numbers. By having a type-neutral representation it is much easier to pass that data as a Variant rather than pre-emptively casting that data to an intermediate type which could lose precision or meaning.
Variant also has a static convenience method called “convert” which lets you leverage the type conversion logic if you just need to use data conversion without creating Variants.
Variants can also represent multiple source variables. For example, if a date and a preferred representation of the date are both known then both values can be set, such that getting a Date will return the already-processed date, but requesting a String value will return the provided representation. This has a lot of other useful implications.
Supported types for variant values include:
- All java primitives except char (int, byte, short, long, double, float, boolean)
- Strings
- Date / Calendar / Instant
There are type conversion methods for basic types, but to simplify cases such as reflection there is also a method called asType
where the target type can be provided as well. Note that it is possible to get a null for values that do not convert successfully, so it is imperative that the code check for nulls when getting values out of a Variant.
Date conversion from string is also provided and uses all possible combination of long and short variations from the SimpleDateFormat API in Java. If these patterns are insufficient then you will have to parse source dates and provide the parsed Date to Variant instead of the String.
CompositeVariant
Composite Variant com.adobe.acs.commons.data.CompositeVariant
builds upon Variant to add a couple new abstractions. The first is that it can represent one or many variants (hence composite). This is incredibly useful if data is processed in a way where it will not be known right away if a value is supposed to be treated as a single value or later combined with other values in a list. By representing the data in a composite variant structure, additional values can later be appended and the resulting type conversion to array is automatic.
The other useful function of the composite variant is that, combined with Variant’s type conversions, this class provides conversion to the property types understood by the Resource value map data type rules. This makes it much easier to build data import and other similar tools which don’t require a lot of messy type conversion.