Nexus and XML
=============
Though NeXus files can be rather complicated, accessing data can be quite easy
as shown in the previous sections of this manual.
However, the complexity of NeXus files can become rather painful when creating
file. This is an issue which typically affects developers working on data
acquisition software.
The ``python-pni`` framework provides a means of generating skeleton NeXus
files from XML. Even existing files can be extended by structures described by
XML.
A NeXus XML primer
------------------
The XML dialect used to create or extend files with ``python-pni`` is very
close to NXDL, the XML language used to define classes in the NeXus standard.
However, NXDL is not powerful enough for creating HDF5 files as it lacks some
tags (for instance tags for describing the chunk-shape of a field) and
the types are not specific enough (`NX_FLOAT` would be valid for 32 and 64 Bit
floating point numbers).
Creating groups
~~~~~~~~~~~~~~~
The `group` tag is used to create groups which can then embed other groups,
fields, or attributes. Its usage is fairly simple
.. code-block:: xml
The `name` attribute of the tag describes the name of the group to create
and the `type` attribute the NeXus base class (this value will be stored in the
`NX_class` attribute of the HDF5 group created).
A typical application for groups would be the construction of a basic NeXus
skeleton
.. code-block:: xml
Creating fields
~~~~~~~~~~~~~~~
Fields are described by the `field` tag. For a simple scalar field use
.. code-block:: xml
Like for the `group` tag the `name` attribute describes the name of the field
to create and the `type` tag its data type. For the data type the standard
:py:mod:`numpy` type strings are used.
For a multidimensional field we have to embed the `dimensions` tag in the field
.. code-block:: xml
The attribute `rank` of the `dimensions` tag stores the number of dimensions
the resulting field should have. The attributes of the `dim` tag for every
dimension should be rather self-explaining. There are however two things to
note here: the dimension index starts with 1 (unlike in C with 0) and
dimensions with 0 elements are allowed (one can later grow the field as we have
already seen). By default the chunk shape matches the field shape with the
first element set to 1. In the above example the chunk shape would be
(1,1024,512).
In order to explicitly set the chunk shape use the `chunk` tag
.. code-block:: xml
The `chunk` tag is currently not implemented due to limitations of the
underlying `libpniio`.
Finally the `field` tag accepts an additional attribute `units` which stores
the physical unit of the data stored in a field.
.. code-block:: xml
The value will be stored in a string attribute `units` attached to the created
HDF5 field.
Dealing with attributes
~~~~~~~~~~~~~~~~~~~~~~~
Attributes are quite similar to fields but created with the `attribute` tag.
An attribute tag can appear within a `field` or `group` tag. Unlike the `field`
tag the `attribute` tag does not have a `units` attribute.
The chunk shape cannot be set and no compression for attribute data is
available.
However, multidimensional attributes can be created just like fields by
embedding an `dimensions` tag within the `attribute` tag.
Here is a short example with an attribute attached to a field
.. code-block:: xml
Writing data from XML
~~~~~~~~~~~~~~~~~~~~~
The last example already shows a problem: it many cases it would be feasible to
not only create a field or attribute but also to fill it with data. This is
particularly true when the field or attribute should store static data which
does not change during an experiment.
:py:mod:`python-pni` supports this feature. Full support is provided for
numeric types. In the above example the `vector` attribute could be filled with
data with
.. code-block:: xml
0.0 0.0 1.0
to denote a rotation around the z-axis. For string types only scalar data is
currently supported. For the previously defined `transformation_type` attribute
we could set the data with
.. code-block:: xml
rotation
From the above examples it is clear that :py:mod:`python-pni` does not follow
the standard NXDL convention for denoting data within a field or attribute tag.
The reason for this is to make the resulting XML more readable.
However, this comes at a price that strings are handled different from numeric
values. The reason for this possibly unexpected behavior is that numeric values
can easily be parsed and thus can be written as a block of whitespace delimited
values. For obvious reasons this is not true for strings as
whitespace-characters can be part of the string. However, for most applications
this limitation is not a serious problem.
Creating links
~~~~~~~~~~~~~~
Links can be created using the `link` tag. External as well as internal links
are supported. An external link can be made like this
.. code-block:: xml
The link below the `NXdata` group refers to the `data` field in the detector
class.
The only thing we have to change for an external link is the target path.
We can modify the above example for the case where the detector data is stored
in a different file like this
.. code-block:: xml
From XML to NeXus
-----------------
To create a NeXus file from XML is rather simple. Just use the
:py:func:`xml_to_nexus` function provided by the package
.. code-block:: python
import pni.io.nx.h5 as nexus
xml_struct = \
"""
......
"""
f = nexus.create_file("test.nxs",overwrite=True)
r = f.root()
nexus.xml_to_nexus(xml_struct,r)
The first argument to :py:func:`xml_to_nexus` is the XML string from which acts
as a blue-print for the structure to create. The second argument is the parent
object below which the structure should be created.
In the above example no data would be written to the file. This is due to a
missing third argument to :py:func:`xml_to_nexus`: the write predicate.
One may does not want to write all the data from the XML to the file. It is
therefore possible to pass a predicate function which decides whether or not
the data for a particular offset should be written to disk.
If all data should be written we can use something like the following
.. code-block:: python
nexus.xml_to_nexus(xml_struct,r,lambda obj: True)
The predicate function takes a single argument, the currently created NeXus
object, and decides then whether or not data should be written by returning
:py:const:`True` or :py:const:`False`.
If only fields of size one should be written we can use the following approach
.. code-block:: python
def write_pred(obj):
return isinstance(obj,nexus.nxfield) and obj.size == 1
nexus.xml_to_nexus(xml_struct,r,write_pred)