Documentation:Tutorial:LargeSignals - MdsWiki
Navigation
Personal tools

From MdsWiki

Revision as of 14:05, 13 September 2017; Manduchi (Talk | contribs)
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Jump to: navigation, search

Handling large signals in MDSplus

In the previous section we have seen how creating pulse files and filling them with data. Data may represent a variety of formats, from scalars to multidimensional arrays and complex types. In particular, we have seen how the "signal" data type is very useful to represent the time evolution of a given quantity. There are however some limitations in the signal usage:

  • The number of samples is a signal cannot exceed few billion (~4 GSamples) because the length of arrays handled in MDSplus is stored in a 32 variable
  • In practice the maximum number of manageable samples in a signal is even smaller because of the memory requirement and the long access time that will make a program very likely crash when accessing a very large signal, or in any case data access would take an unacceptable time.

MDSplus provides the concept of segmented data for handling large signals. When a signal is stored in segments, there is no limit in its dimension and data readout is efficiently managed. Basically, a signal is stored in a segmented node in chunks (aka segments). At any time it is possible to add a new chunk, that is, to enlarge the signal by adding new samples. This feature is useful for long lasting experiment because in this way, the signal samples acquired so far are accessible, even if the signal is still growing.

When reading a segmented node, the inner layers of MDSplus will stick segments together in order to return a signal data type composed of all the signal sample and the associated timebase. However, if the number of samples actually stored in the segmented node exceeds the maximum number of samples in a MDSplus array, signal readout will fail and, again, even if the total number of samples is less, the time and the required memory resources for reading the large signal would be unacceptable.

To overcome this limitation there are two possible solutions:

  • Read each segment using method getSegment(segmentNumber)
  • use method setTimeContext to read (a portion of) the (possibly resampled) signal.

Using the first solution, the signal corresponding to the given segment is returned. It is however necessary to handle portions of signal, possibly making the program more complicated.
Using the second solution, the program is the same as for traditional signals and methods getData() and data() will return the desired samples, leaving the inner data access layers of MDSplus handle the join of different segments and resampling. The definition of the region of interest (ROI) and of the resampling interval is carried out by Tree method:

setTimeContext(startTime, endTime, delta)

the arguments are optional. When startTime (endTime) is missing (i.e. defined as null in java, C++ and as None in python) no start time (end time) is defined in the ROI. When delta is missing, no resampling is done.

method setTimeContext() is a global, that is, all subsequent readouts of segmented nodes (even when they are referred in an expression being evaluated) will use the defined ROI.
In order to reset the ROI, setTimingContext() will be called wit all the three mapameters defines as None (python) or null (C++, java).

It is recommended to always use setTimeContext() when handling large signals: MDSplus performs the required management of segments minimizing the use of memory resources. For example, useless segments, i.e. outside the ROI, are simply skipped when building the resulting signal, with a dramatic reduction in access time.

In the following C++ example a very large signal, composed of one billion samples and describing a signal acquired at 1 MHz for 1000 seconds (from time 0 to time 1000) is built and stored in field HUGE_SIGNAL of pulse file big_tree in segments of 1 million samples each.

#include <mdsobjects.h>
#include <iostream>
int main(int argc, char *argv[])
{
  try {
    //Open the model
    MDSplus::Tree *tree = new MDSplus::Tree("big_tree", -1);
    //Create shot 1
    tree->createPulse(1);
    delete tree;
    //Open shot 1
    tree = new MDSplus::Tree("big_tree", 1);
    
    //Get the node object
    MDSplus::TreeNode *signalNode = tree->getNode("HUGE_SIGNAL");
     
     //Build 1000 segments of 1MSamples each
    int count = 0;
    float *buf = new float[1000000]; 
    for(int segIdx = 0; segIdx < 1000; segIdx++)
    {
      std::cout << "Building segment" << segIdx << std::endl; 
      for(int i = 0; i < 1000000; i++)
      {
	buf[i] = sin(count/1000.);
	count++;
      }
      //Build the timebase using the Range datatype
      //The Range data type specifies start time, end timwe and time interval
      MDSplus::Data *startTime = new MDSplus::Float64(segIdx);
      MDSplus::Data *endTime = new MDSplus::Float64(segIdx+1);
      MDSplus::Data *delta = new MDSplus::Float64(1E-6);
      MDSplus::Data *segDimension = new MDSplus::Range(startTime, endTime, delta);
      
      //Build the segment data from the float buffer
      MDSplus::Array *segData = new MDSplus::Float32Array(buf, 1000000);

      signalNode->makeSegment(startTime, endTime, segDimension, segData);
       
      //Free stuff. NOTE startTime, endTiem and delta do not need to be deallocated 
      //since they have been passed to a Data constructor 
      MDSplus::deleteData(segDimension);
      MDSplus::deleteData(segData);
    }
  }catch(MDSplus::MdsException &exc)
  {
    std::cout << exc.what();
  }
  
  return 0;
}

In the following example, the whole signal is read in a python program, resampled at 10 kHz:

>>> from MDSplus import *
>>> t = Tree('big_tree',1)
>>> Tree.setTimeContext(None, None, 1E-4)
>>> n= t.getNode('HUGE_SIGNAL')
>>> sig = n.data()
>>> sig
array([ 0.09983341,  0.19866933,  0.29552022, ..., -0.61119074,
       -0.52912086, -0.44176418], dtype=float32)
>>> time=n.getDimensionAt(0).data()
>>> time
array([  1.00000000e-04,   2.00000000e-04,   3.00000000e-04, ...,
         9.99999700e+02,   9.99999800e+02,   9.99999900e+02])

In the following code snippet, a time window between times 0.5 and 0.5001 is read, with no resampling

>>> t.setTimeContext(0.5,0.50001,None)
>>> sig1=n.data()
>>> time1=n.getDimensionAt(0).data()
>>> sig1
array([-0.4677718 , -0.46865541, -0.46953857, -0.47042125, -0.47130346,
      -0.47218519, -0.47306645, -0.47394723, -0.47482756, -0.47570738,
      -0.47658676], dtype=float32)
>>> time1
array([ 0.5     ,  0.500001,  0.500002,  0.500003,  0.500004,  0.500005,
        0.500006,  0.500007,  0.500008,  0.500009,  0.50001 ])
>>>

Finally, ROI is reset with the following command

>>> Tree.setTimeContext(None, None, None)

Further improving access time of resampled signals

ciccio