Import libraries¶
import sigma
from sigma.utils import normalisation as norm
from sigma.utils import visualisation as visual
from sigma.utils.load import SEMDataset
from sigma.src.utils import same_seeds
from sigma.src.dim_reduction import Experiment
from sigma.models.autoencoder import AutoEncoder
from sigma.src.segmentation import PixelSegmenter
from sigma.gui import gui
Load files¶
Load the .bcf file and create an object of SEMDataset
(which uses hyperspy as backend.)
file_path = 'test.bcf'
sem = SEMDataset(file_path)
1. Dataset preprocessing¶
1.1 View the dataset¶
Use gui.view_dataset(sem)
to check the BSE image, the sum spectrum, and the elemental maps.
The default feature list (or elemental peaks) is extracted from the metadata of the .bcf file. The elemental peaks can be manually modified in Feature list
and click set
to set the new feature list. The elemental intensity maps will be generated when clicking the set
button. In this way, the raw hyperspectral imaging (HSI) dataset has now become elemental maps with dimension of 279 x 514 x 9 (for the test file).
How can users determine the elements for analysis?
Users can use the interactive widget in the tab "EDX sum spectrum" to check the energy peaks and determine the Feature list
for further amalyses.
gui.view_dataset(dataset=sem)
In addition to the GUI, we can view the dataset directly with the sem
object:
sem.nav_img
: access the back-scattered electron (as a hyperspySignal2D
object).sem.spectra
: access the edx dataset (as a hyperspyEDSSEMSpectrum
object).visual.plot_sum_spectrum(sem.spectra)
: view the sum spectrum (or use hyperspy built-in functionsem.spectra.sum().plot(xray_lines=True)
.sem.feature_list
: view the default chosen elemental peaks in the edx dataset.sem.set_feature_list
: set new elemental peaks.
1.2 Process the dataset¶
Several (optional) functions to process the dataset:¶
sem.rebin_signal(size=(2,2))
: rebin the edx signal with the size of 2x2. After rebinning the dataset, we can access the binned edx or bse data usingsem.edx_bin
orsem.bse_bin
.
Note: The size of binning may be changed depending on the number of pixels and signal-to-noise ratio of the EDS spectra. If the input HSI-EDS data contains too many pixels, that may crush the RAM. As for the signal-to-noise, the counts per pixel of the data is better to be higher than 100. In the case of the test dataset, the average counts per pixel is 29.94 before binning and will be 119.76 after binning.
sem.peak_intensity_normalisation()
: normalise the x-ray intensity along energy axis.sem.remove_fist_peak(end=0.1)
: remove the first x-ray peak (most likely noise) by calling the function with the argumentend
. For example, if one wants to remove the intensity values from 0-0.1 keV, setend=0.1
. This is an optional step.visual.plot_intensity_maps
: Plot the elemental intensity maps.
# Rebin both edx and bse dataset
sem.rebin_signal(size=(2,2))
# Remove the first peak until the energy of 0.1 keV
sem.remove_fist_peak(end=0.1)
# normalisation to make the spectrum of each pixel summing to 1.
sem.peak_intensity_normalisation()
# View the dataset (bse, edx etc.) again to check differences.
# Note that a new tab (including the binned elemental maps) will show up only if the user runs the sem.rebin_signal.
gui.view_dataset(dataset=sem)
The pre-processing steps yield a HSI datacube with the dimension of 139 x 257 x 9 (due to the 2x2 binning).
1.3 Normalisation¶
Before dimensionality reduction, we normalise the elemental maps use sem.normalisation()
, where we can pass a list containing (optional) sequential normalisation steps.
Note that the pre-processing steps are all optional, but if the user opts for the softmax option, z-score step should be applied beforehand, i.e., the softmax normalization procedure requires input data that is z-score normalized. The purpose of the combinatorial normalization step (z-score + softmax) is to provide the autoencoder with an input that includes global information from all pixels and ‘intentional bias’ within a single pixel.
neighbour_averaging
is equivilant to apply a 3x3 mean filter to the HSI-EDS data and is an optional step.zscore
rescales the intensity values within each elemental map such that the mean of all of the values is 0 and the standard deviation is 1 for each map. For example, after z-score normalisation, the Fe Ka map should contain pixels with intensity values that yieldmean=0
andstd=1
.softmax
is applied within each individual pixel containing z-scores of different elemental peaks. For example, if 5 elemental maps are specified, the 5 corresponding z-scores for each individual pixel will be used to calculate the outputs of softmax.
# Normalise the dataset using the (optional) sequential three methods.
sem.normalisation([norm.neighbour_averaging,
norm.zscore,
norm.softmax])
Use gui.view_pixel_distributions
to view the intensity distributions after each sequential normalisation process.
gui.view_pixel_distributions(dataset=sem,
norm_list=[norm.neighbour_averaging,
norm.zscore,
norm.softmax],
cmap='inferno')
1.4 (Optional) Assign RGB to elemental peaks¶
gui.view_rgb(dataset=sem)
1.5 Check elemental distribution after normalisation¶
print('After normalisation:')
gui.view_intensity_maps(spectra=sem.normalised_elemental_data, element_list=sem.feature_list)
2. Dimensionality reduction¶
2.1 Method 1: Autoencoder¶
2.1.1 Initialise experiment / model¶
Parameters for Experiment
descriptor
: str. The name of the model. It will be used as the model name upon saving the model.
general_results_dir
: path. The folder path to save the model(the model will automatically save in the specified folder).
model
: Autoencoder. The model to be used for training. At this moment, only the vanilla autoencoder can be used. More models (e.g., variational autoencoder) will be implemented in the future versions.
model_args
: Dict. Keyword argument for the Autoencoder architecture. The most essential argument is'hidden_layer_sizes' which refers to number of hidden layers and corresponding neurons. For example, if (512,256,128), the encoder will consist of three layers (the first leayer 512 neurons, the second 256 nwurons, and the third 128 neurons). The decoder will also be three layers (128 neurons, 256 nwurons, and 512 neurons). The default setting is recommonded in general cases. Increasing the numbers of layers and neurons will increase the complexity of the model, which raise the risk of overfitting.
chosen_dataset
: np.ndarray. Normalised HSI-EDS data. The size should be (width, height, number of elemental maps).
save_model_every_epoch
: bool. If 'True', the autoencoder model will be saved for each iteration. If 'False', the model will be save only when the loss value is lower than the recorded value.
# The integer in this function can determine different initialised parameters of model (tuning sudo randomness)
# This can influence the result of dimensionality reduction and change the latent space.
same_seeds(1)
# set the folder path to save the model(the model will automatically save in the specified folder)
result_folder_path='./'
# Set up the experiment, e.g. determining the model structure, dataset for training etc.
ex = Experiment(descriptor='softmax',
general_results_dir=result_folder_path,
model=AutoEncoder,
model_args={'hidden_layer_sizes':(512,256,128)},
chosen_dataset=sem.normalised_elemental_data,
save_model_every_epoch=False)
2.1.2 Training¶
Parameters for ex.run_model
num_epochs
: int. The number of entire passes through the training dataset. 50-100 is recommonded for this value. A rule of thumb is that if the loss value stops reducing, that epoch my be a good point to stop.
batch_size
: int. The number of data points per gradient update. Values between 32-128 are recommended. smaller batch size means more updates within an epoch, but is more stochastic for the optimisation process.
learning_rate
: float in a range between 0-1. The learning rate controls how quickly the model is adapted to the problem. 1e-4 is recommended. Higher learning rate may yield faster convergence but have a risk to be stuck in an undesirable local minima.
task
: str. if 'train_all', all data points will be used for training the autoencoder. If 'train_eval', training will be conducted on the training set (85% dataset) and testing on a testing set (15%) for evaluation. The recommended procedure is to run the 'train_eval' for hyperparameter selection, and 'train_all' for the final analysis.
criterion
: str. If 'MSE', the criterion is to measure the mean squared error (squared L2 norm) between each element in the input x and target y. 'MSE' is the only option. Other criteria will be implemented in the future versions.
# Train the model
ex.run_model(num_epochs=20,
batch_size=32,
learning_rate=1e-4,
weight_decay=0.0,
task='train_all',
criterion='MSE'
)
latent = ex.get_latent()
(Optional) Load pre-trained Autoencoder¶
# model_path = './' # model path (the model path should be stored in the folder 'result_folder_path')
# ex.load_trained_model(model_path)
3. Pixel segmentation:¶
3.1 Method 1: Gaussian mixture modelling (GMM) clustering¶
3.1.1 Measure Baysian information criterion (BIC)¶
The gui.view_bic
iteratively calculates the BIC for Gaussian mixture models using the number of Gaussian components.
Parameters for gui.view_bic
latent
: np.ndarray. 2D representations learned by the autoencoder. The size of the input array must be (n, 2), where n is number of total data points.
model
: str. Model for calculation of BIC. Only 'GaussianMixture' is available for now.
n_components
: int. Ifn_components=20
, it shows the BIC values for GMM usingn_components
from 1 to 20.
model_args
: Dict. Keyword arguments for the GMM model in sklearn. For example,random_state
is to specify the random seed for optimisation (This can make the results reproduciable);init_params
is to specify the parameter initialisation of the GMM model ('kmeans' is recommended). See mode detail here.
gui.view_bic(latent=latent,
model='GaussianMixture',
n_components=14,
model_args={'random_state':6, 'init_params':'kmeans'})
3.1.2 Run GMM¶
Parameters for PixelSegmenter
latent
: np.ndarray. The size of the input array must be (n, 2), where n is number of total data points.
dataset_norm
: np.ndarray. Normalised HSI-EDS data. The size should be (width, height, number of elemental maps).
sem
: SEMDataset. The SEM object created in the beginning.
method
: str. Model for clustering.
method_args
: Dict. Keyword arguments for the GMM model in sklearn. See mode detail here.
ps = PixelSegmenter(latent=latent,
dataset=sem,
method="GaussianMixture",
method_args={'n_components':14, 'random_state':6, 'init_params':'kmeans'} )
# can change random_state to different integer i.e. 10 or 0 to adjust the clustering result.
3.3 Visualisation¶
3.3.1 Checking latent space¶
Parameters for gui.view_latent_space
ps
: PixelSegmenter. The object of PixelSegmetor which is just created.
color
: bool. IfTrue
, the the latent space will be colour-coded based on their cluster labels.
# Plot latent sapce (2-dimensional) with corresponding Gaussian models
gui.view_latent_space(ps=ps, color=True)
Parameters for gui.check_latent_space
ps
: PixelSegmenter. The object of PixelSegmetor which is just created.
ratio_to_be_shown
: float. The value must be between 0-1. For example, if 0.5, the latent space will only show 50% data points.
show_map
: bool. IfTrue
, the corresponding locations of the data points will be overlaid on the BSE image.
# visualise the latent space
gui.check_latent_space(ps=ps,ratio_to_be_shown=1.0, show_map=True)
Parameters for gui.check_latent_space
ps
: PixelSegmenter. The object of PixelSegmetor which is just created.
bins
: int. Number of bins for the given interval of the latent space.
# check the density of latent space
gui.plot_latent_density(ps=ps, bins=50)
3.3.2 Checking each clusters¶
# ps.set_feature_list(['Al_Ka', 'C_Ka', 'Ca_Ka', 'Fe_Ka', 'K_Ka', 'O_Ka', 'Si_Ka', 'Ti_Ka', 'Zn_La'])
gui.show_cluster_distribution(ps=ps)
3.3.3 Checking cluster map¶
# Plot phase map using the corresponding GM model
gui.view_phase_map(ps=ps, alpha_cluster_map=0.5)
Parameters for gui.view_clusters_sum_spectra
ps
: PixelSegmenter. The object of PixelSegmetor which is just created.
normalisation
: bool. IfTrue
, the sum spectra will be normalised.
spectra_range
: tuple. Set the limits of the energy axes (in KeV).
gui.view_clusters_sum_spectra(ps=ps, normalisation=True, spectra_range=(0,8))
4. Unmixing cluster spectrums using Non-negative Matrix Fatorization (NMF)¶
Parameters for gui.get_unmixed_edx_profile
clusters_to_be_calculated
: str or List. If'All'
, all cluster-spectra will be included in the data matrix for NMF. If it is a list of integers (e.g. [0,1,2,3]), only #0, #1, #2, and #3 cluster-spectra will be used as the data matrix for NMF.
normalised
: bool. IfTrue
, the sum spectra will be normalised before NMF unmixing.
method
: str. Model to be used.
method_args
: Dict. Keyword arugments for the NMF method (or others). See more detail here.
weights, components = ps.get_unmixed_spectra_profile(clusters_to_be_calculated='All',
n_components='All',
normalised=False,
method='NMF',
method_args={'init':'nndsvd'})
gui.show_unmixed_weights_and_compoments(ps=ps, weights=weights, components=components)
Check abundance map for components (using RGB maps)¶
gui.show_abundance_map(ps=ps, weights=weights, components=components)
Statistics infro from clusters¶
gui.show_cluster_stats(ps=ps)