DataCube#
DataCube class contains is made to operate in multiple single files .
- DataCube represent a stack of rasters which have the same dimensions, contains data that have same dimensions (rows
& columns).
The datacube object has some attributes and methods to help working with multiple rasters files, or to repeat thesame operation on multiple rasters.
To import the raster module
from pyramids.dataset import Datacube
The detailed module attributes and methods are summarized in the following figure.
Attributes#
The DataCube object will have the following attributes
base: Dataset object
columns: number of columns in the dataset.
rows: number of rows in the dataset.
time_length: number of files/considering the each file represent a timestamp.
shape: (time_length, rows, columns).
files: file that have been read.
Methods#
read_multiple_files#
- read_multiple_files parse files in a directory and construct the array with the dimension of the first reads
rasters from a folder and creates a 3d array with the same 2d dimensions of the first raster in the folder and length as the number of files.
- inside the folder.
All rasters should have the same dimensions
- If you want to read the rasters with a certain order, then all raster file names should have a date that follows
the same format (YYYY.MM .DD / YYYY-MM-DD or YYYY_MM_DD) (i.e. “MSWEP_1979.01.01.tif”).
Note
read_multiple_files only parse the files names’ in the given directory, to open each raster and read a specific, band from each raster and add it to the DataCube you have to do one step further using the open_datacube method.
Parameters#
- path:[str/list]
path of the folder that contains all the rasters, ora list contains the paths of the rasters to read.
- with_order: [bool]
True if the rasters names’ follows a certain order, then the rasters names should have a date that follows the same format (YYYY.MM.DD / YYYY-MM-DD or YYYY_MM_DD). >>> “MSWEP_1979.01.01.tif” >>> “MSWEP_1979.01.02.tif” >>> … >>> “MSWEP_1979.01.20.tif”
- regex_string: [str]
a regex string that we can use to locate the date in the file names.Default is r”d{4}.d{2}.d{2}”. >>> fname = “MSWEP_YYYY.MM.DD.tif” >>> regex_string = r”d{4}.d{2}.d{2}” - or >>> fname = “MSWEP_YYYY_M_D.tif” >>> regex_string = r”d{4}_d{1}_d{1}” - if there is a number at the beginning of the name >>> fname = “1_MSWEP_YYYY_M_D.tif” >>> regex_string = r”d+”
- date: [bool]
True if the number in the file name is a date. Default is True.
- file_name_data_fmt[str]
if the files names’ have a date and you want to read them ordered .Default is None >>> “MSWEP_YYYY.MM.DD.tif” >>> file_name_data_fmt = “%Y.%m.%d”
- start: [str]
start date if you want to read the input raster for a specific period only and not all rasters, if not given all rasters in the given path will be read.
- end: [str]
end date if you want to read the input temperature for a specific period only, if not given all rasters in the given path will be read.
- fmt: [str]
format of the given date in the start/end parameter.
- extension: [str]
the extension of the files you want to read from the given path. Default is “.tif”.
Cases#
with_order = False#
if you want to make some mathematical operation on all the raster, then the order of the rasters does not matter.
rasters_folder_path = "examples/data/geotiff/raster-folder"
datacube = Datacube.read_multiple_files(rasters_folder_path)
print(datacube)
>>> Files: 6
>>> Cell size: 5000.0
>>> EPSG: 4647
>>> Dimension: 125 * 93
>>> Mask: 2147483648.0
with_order = True#
If the order in which each raster represent is important (each raster is represents a time stamp)
- To read the rasters with a certain order, each raster has to have a date in its file name, and using the format of
this name the method is going to read the file in right order.
the raster directory contents are files with a date in each file name
>>> MSWEP_1979.01.01.tif
>>> MSWEP_1979.01.02.tif
>>> MSWEP_1979.01.03.tif
>>> MSWEP_1979.01.04.tif
>>> MSWEP_1979.01.05.tif
>>> MSWEP_1979.01.06.tif
rasters_folder_path = "examples/data/geotiff/raster-folder"
datacube = Datacube.read_multiple_files(
rasters_folder_path, regex_string=r"\d{4}.\d{2}.\d{2}", date=True, file_name_data_fmt="%Y.%m.%d",
)
print(datacube)
>>> Files: 6
>>> Cell size: 5000.0
>>> EPSG: 4647
>>> Dimension: 125 * 93
>>> Mask: 2147483648.0
the raster directory contents are files with a number in each file name
>>> 0_MSWEP.tif
>>> 1_MSWEP.tif
>>> 2_MSWEP.tif
>>> 3_MSWEP.tif
>>> 4_MSWEP.tif
rasters_folder_path = "tests/data/geotiff/rhine"
datacube = Datacube.read_multiple_files(
rasters_folder_path, with_order=True, regex_string=r"\d+", date=False,
)
print(datacube)
>>> Files: 3
>>> Cell size: 5000.0
>>> EPSG: 4647
>>> Dimension: 125 * 93
>>> Mask: 2147483648.0
open_datacube#
- After using the read_multiple_files method to parse the files in the directory, you can read the values of a
specific band from each raster using the open_datacube method.
rasters_folder_path = "examples/data/geotiff/raster-folder"
datacube = Datacube.read_multiple_files(rasters_folder_path, file_name_data_fmt="%Y.%m.%d", separator=".")
dataset.open_datacube()
print(dataset.values.shape)
>>> (6, 125, 93)
create_cube#
Create a DataCube object.
Propteties#
update the data in the DataCube object