Release Notes¶
0.7¶
Data access¶
A new
get_dask_array()
method to access data as a Dask array (PR #212). Dask is a powerful tool for working with large amounts of data and doing computation in parallel.open_run()
andRunDirectory()
now take an optionalinclude=
glob pattern to select files to open (PR #221). This can make opening a run faster if you only need to read certain files.Trying to open a run directory to which you don’t have read access now correctly raises PermissionError (PR #210).
stack_detector_data()
has a new parameterreal_array
. Passingreal_array=False
avoids copying the data into a temporary array on the way to assembling images with detector geometry (PR #196).When you open a run directory with
open_run()
orRunDirectory()
, karabo_data tries to cache the metadata describing what data is in each file (PR #206). Once the cache is created, opening the run again should be much faster, as it only needs to open the files containing the data you want. See Cached run data maps for the details of how this works.Importing
karabo_data
is faster, as packages like xarray and pandas are now only loaded if you use the relevant methods (PR #207).lsxfel and
info()
are faster in some cases, as they only look in one file for the detector data shape (PR #219).get_array()
is slightly faster, as it avoids copying data in memory unnecessarily (PR #209)When you select sources with
select()
ordeselect()
, the resulting DataCollection no longer keeps references to files with no selected data. This should make it easier to then combine data withunion()
in some situations (PR #202).Data validation now checks that indexes have one entry per train ID.
Detector geometry¶
plot_data_fast()
is much more flexible, e.g. if you want to add a colorbar or draw the image as part of a larger figure (PR #205). See its documentation for the new parameters.
0.6¶
Data access¶
The karabo-bridge-serve-files command now takes
--source
and--key
options to select data to stream. They can be used with exact source names or with glob-style patterns, e.g.--source '*/DET/*'
(PR #183).Skip checking that
.h5
files in a run directory are HDF5 before trying to open them (PR #187). The error is still handled if they are not.
Detector geometry¶
Assembling detector data into images can now reuse an output array - see
position_modules_fast()
andoutput_array_for_position_fast()
(PR #186).CrystFEL format geometry files can now be written for 2D input arrays with the modules arranged along the slow-scan axis, as used by OnDA (PR #191). To do this, pass
dims=('frame', 'ss', 'fs')
towrite_crystfel_geom()
.The geometry code has been reworked to use metres internally (PR #193), along with other refactorings in PR #184 and PR #192. These changes should not affect the public API.
0.5¶
Data access¶
New method
get_data_counts()
to find how many data points were recorded in each train for a given source and key.Create a virtual dataset for any single dataset with
get_virtual_dataset()
(PR #162). See Parallel processing with a virtual dataset for how this can be useful.Write a file with virtual datasets for all selected data with
write_virtual()
(PR #132).Data from the supported multi-module detectors (AGIPD, LPD & DSSC) can be exposed in CXI format using a virtual dataset - see
write_virtual_cxi()
(PR #150, PR #166, PR #173).New class
DSSC
for accessing DSSC data (PR #171).New function
open_run()
to access a run by proposal and run number rather than path (PR #147).stack_detector_data()
now allows input data where some sources don’t have the specified key (PR #141).Files in the new
1.0
data format can now be opened (PR #182).
Detector geometry¶
New class
DSSC_Geometry
for handling DSSC detector geometry (PR #155).LPD_1MGeometry
can now read and write CrystFEL format geometry files, and produce PyFAI distortion arrays (PR #168, PR #129).write_crystfel_geom()
(for AGIPD and LPD geometry) now accepts various optional parameters for other details to be written into the geometry file, such as the detector distance (clen
) and the photon energy (PR #168).New method
get_pixel_positions()
to get the physical position of every pixel in a detector, for all of AGIPD, LPD and DSSC (PR #142).New method
data_coords_to_positions()
to convert data array coordinates to physical positions, for AGIPD and LPD (PR #142).
0.4¶
Python 3.5 is now the minimum required version.
Fix compatibility with numpy 1.14 (the version installed in Anaconda on the Maxwell cluster).
Better error message from
stack_detector_data()
when passed non-detector data.
0.3¶
New features:
New interfaces for working with AGIPD, LPD & DSSC Geometry.
New interfaces for accessing AGIPD, LPD & DSSC data.
select_trains()
can now select arbitrary specified trains, not just a slice.get_array()
can take a region of interest (roi
) parameter to select a slice of data from each train.A newly public
keys_for_source()
method to list keys for a given source.
Fixes:
stack_detector_data()
can handle missing detector modules.Source sets have been changed to frozen sets. Use
select()
to choose a subset of sources.get_array()
now only loads the data for selected trains.get_array()
works with data recorded more than once per train.
0.2¶
New command
karabo-data-validate
to check the integrity of data files.New methods to select a subset of data:
select()
,deselect()
,select_trains()
,union()
,Selected data can be written back to a new HDF5 file with
write()
.RunDirectory()
andH5File()
are now functions which return aDataCollection
object, rather than separate classes. Most code using these should still work, but checking the type with e.g.isinstance()
may break.