Documentation:Tutorial:LargeSignals - MdsWiki
Navigation
Personal tools

From MdsWiki

Jump to: navigation, search

Contents

Handling large signals in MDSplus

In the previous section we have seen how to create pulse files and fill them with data. Data may represent a variety of formats, from scalars to multidimensional arrays and complex types. In particular, we have seen how the "signal" data type is very useful to represent the time evolution of a given quantity. There are however some limitations in the signal usage:

  • The number of samples is a signal cannot exceed few billion (~4 GSamples) because the length of arrays handled in MDSplus is stored in a 32-bit variable
  • In practice the maximum number of manageable samples in a signal is even smaller because of the memory requirement and the long access time that will make a program very likely crash when accessing a very large signal, or in any case data access would take an unacceptable time.

MDSplus provides the concept of segmented data for handling large signals. When a signal is stored in segments, there is no limit in its dimension and data readout is efficiently managed. Basically, a signal is stored in a segmented node in chunks (aka segments). At any time it is possible to add a new chunk, that is, to enlarge the signal by adding new samples. This feature is useful for long-lasting experiments because in this way, the signal samples acquired so far are accessible, even if the signal is still growing.

When reading a segmented node, the inner layers of MDSplus will stick segments together in order to return a signal data type composed of all the signal samples and the associated timebase. However, if the number of samples actually stored in the segmented node exceeds the maximum number of samples in a MDSplus array, signal readout will fail and, again, even if the total number of samples is less, the time and the required memory resources for reading the large signal would be unacceptable.

To overcome this limitation there are two possible solutions:

  • Read each segment using method getSegment(segmentNumber)
  • use method setTimeContext to read (a portion of) the (possibly resampled) signal.

Using the first solution, the signal corresponding to the given segment is returned. It is however necessary to handle portions of signal, possibly making the program more complicated.
Using the second solution, the program is the same as for traditional signals and methods getData() and data() will return the desired samples, leaving the inner data access layers of MDSplus handle the join of different segments and resampling. The definition of the region of interest (ROI) and of the resampling interval is carried out by Tree method:

setTimeContext(startTime, endTime, delta)

the arguments are optional. When startTime (endTime) is missing (i.e. defined as null in java, C++ and as None in Python) no start time (end time) is defined in the ROI. When delta is missing, no resampling is done.
The method setTimeContext() is a global, that is, all subsequent readouts of segmented nodes (even when they are referred in an expression being evaluated) will use the defined ROI.
In order to reset the ROI, setTimingContext() will be called with all three parameters defined as None (Python) or null (C++, java).

It is recommended to always use setTimeContext() when handling large signals: MDSplus performs the required management of segments minimizing the use of memory resources. For example, unused segments, i.e. outside the ROI, are simply skipped when building the resulting signal, with a dramatic reduction in access time.

In the following C++ example a very large signal, composed of one billion samples and describing a signal acquired at 1 MHz for 1000 seconds (from time 0 to time 1000) is built and stored in field HUGE_SIGNAL of pulse file big_tree in segments of 1 million samples each.

#include <mdsobjects.h>
#include <iostream>
int main(int argc, char *argv[])
{
  try {
    //Open the model
    MDSplus::Tree *tree = new MDSplus::Tree("big_tree", -1);
    //Create shot 1
    tree->createPulse(1);
    delete tree;
    //Open shot 1
    tree = new MDSplus::Tree("big_tree", 1);
    
    //Get the node object
    MDSplus::TreeNode *signalNode = tree->getNode("HUGE_SIGNAL");
     
     //Build 1000 segments of 1MSamples each
    int count = 0;
    float *buf = new float[1000000]; 
    for(int segIdx = 0; segIdx < 1000; segIdx++)
    {
      std::cout << "Building segment" << segIdx << std::endl; 
      for(int i = 0; i < 1000000; i++)
      {
	buf[i] = sin(count/1000.);
	count++;
      }
      //Build the timebase using the Range datatype
      //The Range data type specifies start time, end timwe and time interval
      MDSplus::Data *startTime = new MDSplus::Float64(segIdx);
      MDSplus::Data *endTime = new MDSplus::Float64(segIdx+1);
      MDSplus::Data *delta = new MDSplus::Float64(1E-6);
      MDSplus::Data *segDimension = new MDSplus::Range(startTime, endTime, delta);
      
      //Build the segment data from the float buffer
      MDSplus::Array *segData = new MDSplus::Float32Array(buf, 1000000);

      signalNode->makeSegment(startTime, endTime, segDimension, segData);
       
      //Free stuff. NOTE startTime, endTiem and delta do not need to be deallocated 
      //since they have been passed to a Data constructor 
      MDSplus::deleteData(segDimension);
      MDSplus::deleteData(segData);
    }
  }catch(MDSplus::MdsException &exc)
  {
    std::cout << exc.what();
  }
  
  return 0;
}

In the following example, the whole signal is read in a Python program, resampled at 10 kHz:

>>> from MDSplus import *
>>> t = Tree('big_tree',1)
>>> Tree.setTimeContext(None, None, 1E-4)
>>> n= t.getNode('HUGE_SIGNAL')
>>> sig = n.data()
>>> sig
array([ 0.09983341,  0.19866933,  0.29552022, ..., -0.61119074,
       -0.52912086, -0.44176418], dtype=float32)
>>> time=n.getDimensionAt(0).data()
>>> time
array([  1.00000000e-04,   2.00000000e-04,   3.00000000e-04, ...,
         9.99999700e+02,   9.99999800e+02,   9.99999900e+02])

In the following code snippet, a time window between times 0.5 and 0.5001 is read, with no resampling

>>> t.setTimeContext(0.5,0.50001,None)
>>> sig1=n.data()
>>> time1=n.getDimensionAt(0).data()
>>> sig1
array([-0.4677718 , -0.46865541, -0.46953857, -0.47042125, -0.47130346,
      -0.47218519, -0.47306645, -0.47394723, -0.47482756, -0.47570738,
      -0.47658676], dtype=float32)
>>> time1
array([ 0.5     ,  0.500001,  0.500002,  0.500003,  0.500004,  0.500005,
        0.500006,  0.500007,  0.500008,  0.500009,  0.50001 ])
>>>

Finally, ROI is reset with the following command

>>> Tree.setTimeContext(None, None, None)

Further improving access time of resampled signals

We have seen so far how using setTimeContext() to handle large signal readout. In particular, resampling is mandatory unless getting a very tiny portion of the signal in time. Improved resampling efficiency, with the consequent reduction of the data access times, is available in MDSplus by making a very small change in the method used when building large signals. In this case, it is necessary to reserve a new tree node that is going to contain a resampled version of the signal, built at the time signal segments are written. The following C++ code is almost the same as the previous example, except for the use of a new node (HUGE_RESAMP in the example) and method makeSegmentResampled() in place of makeSegment()

#include <mdsobjects.h>
#include <iostream>

int main(int argc, char *argv[])
{
  try {
    //Open the model
    MDSplus::Tree *tree = new MDSplus::Tree("big_tree", -1);
    //Create shot 1
    tree->createPulse(1);
    delete tree;
    //Open shot 1
    tree = new MDSplus::Tree("big_tree", 1);
    
    //Get the node object
    MDSplus::TreeNode *signalNode = tree->getNode("HUGE_SIGNAL");
    MDSplus::TreeNode *resampledNode = tree->getNode("HUGE_RESAMP");
    
    //Build 1000 segments of 1MSamples each
    int count = 0;
    float *buf = new float[1000000]; 
    for(int segIdx = 0; segIdx < 1000; segIdx++)
    {
      std::cout << "Building segment" << segIdx << std::endl; 
      for(int i = 0; i < 1000000; i++)
      {
	buf[i] = sin(count/1000.);
	count++;
      }
      //Build the timebase using the Range datatype
      //The Range data type specifies start time, end time, and time interval
      MDSplus::Data *startTime = new MDSplus::Float64(segIdx);
      MDSplus::Data *endTime = new MDSplus::Float64(segIdx+1);
      MDSplus::Data *delta = new MDSplus::Float64(1E-6);
      MDSplus::Data *segDimension = new MDSplus::Range(startTime, endTime, delta);
      
      //Build the segment data from the float buffer
      MDSplus::Array *segData = new MDSplus::Float32Array(buf, 1000000);
      
      signalNode->makeSegmentResampled(startTime, endTime, segDimension, segData, resampledNode);
      
      //Free stuff. NOTE startTime, endTime and delta do not need to be deallocated 
      //since they have been passed to a Data constructor 
      MDSplus::deleteData(segDimension);
      MDSplus::deleteData(segData);
    }
  }catch(MDSplus::MdsException &exc)
  {
    std::cout << exc.what();
  }
  
  return 0;
}

method makeSegmentResampled() requires and additional argument, i.e the node that is going to receive the resampled version of the signal. A further optional parameter defines rhe resampling factor, and it is set to 100 by default. Nothing is changed in signal readout. MDSplus in fact keeps in the node metadata all the information that is needed to decide whether carrying out on the fly resampling based on the original signal or its resampled version. The final result is a sensible reduction of access time when performing large signal resampling. Note that the node containing the resampled version of the signal is only for internal MDSplus usage.

Visualization of large signals using jScope

Dynamic resampling of large waveforms is automatically performed by jScope, when using the MDSDataProvider data source (the recommended one). When jScope is requested to visualize an entire signal, jScope finds out what is the required resampling factor (depending on the number of original signal samples) and asks the data provider (mdsip) server for a resampled version of the signal. When zooming a displayed waveform, resolution is possibly lost, and therefore jScope dynamically requests the data provider the signal for the ROI corresponding to the zoomed window. The user will likely experience a small delay just after zooming before getting the required signal resolution (depending on the speed of the mdsip server and the network connection). In order to avoid aliasing when carrying out signal resampling, jScope actually requests to the data server the minumum and maximum value of every resampling interval, rather than the base resampled value. This is all performed in a completely transparent manner, but, in order to avoid aliasing for large signals in case an additional tree node is used to keep a resampled version of the signal, the use of method makeSegmentMinMax() in place of makeSegmentResampled() is preferred. makeSegmentMinMax() will store in the support node all the information required to avoid aliasing in jScope.
Note that the use of either makeSegmentMinMax() or makeSegmentResampled() is not mandatory. However, for very large signals the time required for data access (and jScope visualization) can be widely reduced in this way.

Data Streaming

Data streaming is useful when a stream of data has to be communicated among MDSplus components without involving data storage in puse files. For example, when data are acquired by an ADC device and written in a tree, a subsampled stream may be sent to a visualization tool in order to display live waveforms. Streaming is especially useful when handling trends and can be used in trend visualization tools.
In MDSplus data streaming is implemented by means of MDSplus events, where a single event can bring one or more data samples. It is possible to generate a stream of data (in C++ and Python) and to receive a data stream (in C++) by associating a callback method that is called whenever a (chunck of) data is received.
In addition, a node.js server is available that provides data flow dispatching towards web browsers.
As data streaming is implemented in MDSplus messages, based in turn on UDP, the following facts must be considered:

  • The throughput in streaming is limited by the troughput in UDP message communication. Recall that a single MDSplus event can bring one or more data samples, depending whether a single sample or an array of samples is sent.
  • The maximum number of data samples that can be sent in a single message depends on the maximum UDP datagram length. As a rule of thumb no more than few hundreds data samples can be sent in a single calls (and therefore MDSplus message), considering that each sample is composed of a (value, time) pair and that the values are sent in ASCII format.



In C++, class EventStream provides streaming support. The following static mthod are used to send a single sample (embedded in a single MDSplus event):

 static void send(int shot, const char *name, float time, float sample);
 static void send(int shot, const char *name, uint64_t time, float sample);

Argument shot is a generic number that is passed along the data samples. Argument name is a user-provided string and identifies the communication channel: a receiver registering for a given name will receive all the data sent by any sender using that channel name.
In the second case, an absolute time value is assciated with the data sample. This option is useful, for example, when dealing with trend signals, normally acquired considering absolute times.
The following methods will send an array of samples:

 static void send(int shot, const char *name, int numSamples, float *times, float *samples, bool oscilloscopeMode = false);
 static void send(int shot, const char *name, int numSamples, uint64_t *times, float *samples);

The passed samples are encoded in a single MDSplus event. The optional boolean parameter oscilloscopeMode is meaningful when sending the data stream to the visualization tool (see below). In this case, instead of displaying data as a stripchart, data are displayed in a oscilloscope-like way. In this case, the time (x) values of the data will not be considered, and the values of each message will be replaced in visualization by the values of the next one. In the following example, two different signals with relative and absolute time, respectively, are streamed at channel Channel1 and Channel2, respectively.

#include <mdsobjects.h>
#include <sys/time.h>
#include <time.h>
#include <math.h>
#include <iostream>

int main(int argc, char *argv[])
{
    const char *channel1 = "Channel1", *channel2 = "Channel2";
    struct timeval currAbsTime; 
    uint64_t currAbsTimeMs;
    struct timespec sleepTime;
    float currRelTime, currVal1, currVal2;
    currRelTime = 0;
    sleepTime.tv_sec = 0;
    sleepTime.tv_nsec = 100000000; 
    int shot = 1;
    int idx = 0;
    while(true)
    {
	//100 ms period
        nanosleep(&sleepTime, NULL);
	gettimeofday(&currAbsTime, NULL);
	//UTC time
	currAbsTimeMs = currAbsTime.tv_sec * 1000+currAbsTime.tv_usec /1000;
	//Relative time
	currRelTime = idx++/10.;
	currVal1 = sin(currRelTime*0.1);
	currVal2 = sin(currRelTime*0.1) * cos(currRelTime*0.033);
	MDSplus::EventStream::send(shot, "Channel1", currRelTime, currVal1);
	MDSplus::EventStream::send(shot, "Channel2", currAbsTimeMs, currVal2);
	std::cout << currRelTime << std::endl;
    }
}

The same example in python is listed below:

from MDSplus import *
import time
import math

currTime = 0.
shot = 1
idx = 0
while(True):
    time.sleep(0.1)
    currTime = idx * 0.1
    idx = idx + 1
    currVal1 = math.sin(currTime*0.1)
    currVal2 = math.sin(currTime*0.1) * math.cos(currTime*0.033)
    print(currTime)
    Event.stream(shot, 'Channel1', Float32(currTime), Float32(currVal1))
    Event.stream(shot, 'Channel2', Int64(int(time.time()*1000)), Float32(currVal2))

It is possible to receive a given data stream by associating a callback method to be called upon reception of new data sampe(s) for the given channel. This is provided by class MDSplus::EventStream that inherits from class MDSplus::Event. The following method will associate a listener object (that will derive from class MDSplus::DataStreamListener) to the data stream corresponding to the passed name:

void registerListener(DataStreamListener *listener, const char *name);

Class MDSplus::DataStreamListener defines method

void dataReceived(Data *samples, Data *times, int shot)

that will receive the associated sample(s) as a MDSPlus::Data object, that will be either a MDSplus::Float32 or MDSplus::Float32Array object. When relative times are used, times argument will be either of class MDSplus::Float32 or MDSplus::FLoat32Array. Otherwise, for absolute times, it will be either MDSplus::Int64 or MDSplus::Int64Array.

In the following example, a listener will listen to data stream for two channels name Channel1 and Channel2, printing the received data samples:

#include <mdsobjects.h>
#include <unistd.h>
class MyListener: public MDSplus::DataStreamListener
{
    const char *channelName;
public:
   MyListener(const char *channelName)
   {
       this->channelName = channelName;
   }
   void dataReceived(MDSplus::Data *samples, MDSplus::Data *times, int shot)
   {
         std::cout << "Channel: " << channelName << "   Time: " << times << "  Value: " << samples << std::endl;
   }
};

int main(int argc, char *argv[])
{
    MyListener listen1("Channel1"), listen2("Channel2");
    MDSplus::EventStream evStream;
   
    evStream.registerListener(&listen1, "Channel1");
    evStream.registerListener(&listen2, "Channel2");
    evStream.start(); //Start event reception
    sleep(10000);
    return 0;
};

A Web based visualization tool for data streaming is provided in MDSplus. It is based on a node.js application that listens for MDSplus events bringing data and dispatch them to the registered Web pages via Server Send Event protocol. As MDSplus events are based on UDP, it is necessary that the node.js application runs on a machine that is properly configured for receiving MDSplus events (see MDSplus events).

The node.js application must be started in the MDSplus distribution nodejs directory, i.e. $MDSPLUS_DIR/nodejs, via command

node event_server.js <HTTP listen port> [debug]

The HTTP listen port is used by the web server launched by the application to listen to incoming connections for data display. When debug is passed as second argument, messages about incoming Data and connections are printed. The node.js application relies on the express and sse packages. If they are not installed, it is necessary to install fiirst them via the following commands:

npm install express
npm install sse

The simplest URL to be used in a Web page (Chrome recommended) to display the data stream is:

 <IP address>:<port>?panel=<Panel title>&s=<channel>

where IP address is the address of the machine running the node.js application, port is the port number passed to the node.js application, Panel title is the title displayed in the visualization page , channel is the channel name of the target data stream.
More in general, the following parameters can be associated to a panel:

  • panel: name of the panel
  • length: number of samples to be displayed in the graph. Once this number has been reached, the signal flows as a strip char.
  • xlabel: label of X axis
  • ylabel: label of Y axis
  • s: channel name of signal
  • c: signal color (default: black, can be: yellow,red,blue,cyan,green,...)

More than one signal can be displayed in the same panel, provided they have the same timebase, by appending new pairs &s=<channel>&c=<color> in the same URL. More than one panel can be specifified my appending a new panel definition separated by the previous one by either &mnext or &mnextnl. &mnext instructs the browser to align panels in the same row if enough room available, otherwise in the next row. &mnextnl instruct the browser to lay the graph in the subsequent row. For example URL

http://<IP adress>:<port>/?panel=FirstSignal&s=Channel1&c=blue&mnextnl&panel=SecondSignal&s=Channel2&c=red

will display the two data streams generated by the previous example, as shown below (recall that the first signal uses relative times and the second one absolute UTC times): Image:StreamVisualization.jpg
The following global parameters can be specified in the URL befor the panel definition(s):

  • htitle: title shown in the header common for all panels
  • defaultLength: number of samples if not indicated in panel configuration

A few caveats:

  • Unless the stream has been generated with the Oscilloscope option the page expects increasing time values. If this is not the case (e.g. when repeating tests generating the same data stream), the page has to be reloaded. However, if the value of shot changes (the typical use case during expreiments), the page is cleaned and the new stream displayed.
  • The node.js application keeps in memory recent history of the signals to be displayed, so that the reloading a page does not lose data as they are reloaded by the node.js server. The node.js application does not listen to a given data stream untl at least one Web page registers for it.