14 January 2013

The fastest way to export targeted data from BSO Essbase with NONEMPTYBLOCK

NB – The best way to experience this blog is with one of the following musical soundtracks open and playing in a loop on another tab or window.  Yes, Big Black’s RacerX gets the idea across just right.  And yes, it may be hard to believe for those of you who have met me that I was ever a fan of punk, but yup, that was my youthful musical rebellion.  For those of you who are not manly (womanly?  whatever) enough to take that can listen to the original Speed Racer theme as it ought to be heard.  All too much for you?  How about the version that most matches my current musical tastes?  Here’s a nice meld of a Quncy Jones bossa nova arrangement of “Desafinado” and of course the appropriate speed frame of mind.

Okay, with that bit of not-quite-totally irrelevant trivia out of the way, but with you most definitely in the mood for speed, speed, speed (and, if you are listening to Big Black, the desire to throw yourself into a mosh pit, yeah, this blog caters to all tastes) read on for some pretty darn exciting news about extracting data out of BSO Essbase.

The obligatory primer

When it comes to getting data out of BSO Essbase, there are a couple of ways to extract data.

Export it

From the days of long ago Esscmd’s EXPORT (believe it or not, I saw a new system last year that 100% relied on Esscmd.  Look, I thought it was a nice succinct language, but it is dead, dead, dead.  As you might imagine, the rest of the system was pants.), to MaxL’s export data command, to EAS’ export data functionality.  If you need to get ALL of the data out of the database for backup purposes, go crazy with parallel exports and have a good time.  And note, if you are working with a 92 .PAG file database as I am currently, that is NOT going to be a fast process.

Focused exports

But what happens when you need to only export a portion of a database?  The above approaches aren’t going to do the trick because they are all or nothing.  

Happily Essbase provides many ways to extract data:  report scripts, the DATAEXPORT calc script command, and, for those of us who use BSO on a regular basis, the slightly exotic looking MDX queries.  Let’s examine each in turn and yes, I have a (and I think once you see it you will agree this is sans hyperbole) technique whose performance will leave you absolutely gobsmacked – it did for me and I am not all that easily overawed.  

But just as with dinner, first the meat and veg, and then the oh-so-tasty dessert.  I am not doing this out of sheer bloody-mindedness but because I think you need to see all of the options.  And of course malva pudding tastes best after the Bobotie (mmmmm, South African food is lekker).  Enough of my culinary analogy – the other thing this review does is show how this new technique spanks every other approach.  It is a Most Awesome Hack.

Essbase report scripts

So what if we stepped back to year zero and tried this the old fashioned way with Essbase report scripts.  They have been around, literally, from the beginning of Essbase and are known to most.  

What does a report script look like that exports level zero Product and Postcode in the Forecast Scenario, Working Version, in Jan FY12 for the Account AT?  Oh, something like this:

We likely want to run this in batch (I dunno, do you like staring at a screen as you wait and wait for a process to finish?  Me neither.) via MaxL:
export database db.dbname using server report_file "Test1" to data_file "c:\\tempdir\\Test1.txt" ;

How long does it take?  1269.72 seconds for 21,788 records (one for the header).  The output looks like this (I could have suppressed the headers with the SUPHEADING keyword but chose not to):

BSO calc script DATAEXPORT

DATAEXPORT is the darling of many because it can be invoked within the context of a calc script and because it can produce nicely formatted (well, cleanly formatted) exports and even write to SQL.  

Which produces something like this (which does look a bit a ugly in WordPad):

Why oh why did I open it in WordPad?  Because DATAEXPORT only writes a Line Feed (LF) at the end of the record, not CRLF as Windows requires.  Poor old Notepad can’t handle that so you get this:

How long did all of that take?  Why a mere 1070.79 seconds for a 21,787 record export.  That’s just about 16% faster than a report script, so if speed is the purpose, then DATAEXPORT is the way to go.

So what about MDX?

Well, this wouldn’t be where I would normally turn for data extraction, mostly because MDX is so ASO-ish and I’m not writing formulas in an ASO Essbase database.  But in fact MDX has a query language with ROWs, COLUMNs, PAGEs, and as many AXISes as one can shake a stick at (actually the max is 64).  

I really got inspired to try out MDX as part of a Really Special Project (i.e., a project that I am doing for “fun” and getting $0/hour for and OMG the hours are killing me – I’d be a freaking millionaire if I were doing this for a client.  Write constructive suggestions to me care of this blog on how to conduct oneself professionally with an eye to not going broke.) and because of a recent thread on Network54.

Well, after an amazing amount of pain for really very little output (this is, sadly, my modus operandi when it comes to new-to-me technology) and boost from my buddy Dan Pressman, Mr. ASO and by extension sorta-Mr. MDX query, I came up with the following:

That’s the query with pretty colors.  To make this actually somewhat readable when it gets output, I stuck the query in the following MaxL script:
spool on to "c:\\tempdir\\MDX_Extract_Test.log" ;

login username password on servername ;

alter application ep clear logfile ;

/*    The below settings are right out of Developing Essbase Applications    */
alter session set dml_output alias off ;
alter session set dml_output numerical_display fixed_decimal ;
alter session set dml_output precision 15 ;
set column_width 80 ;
set timestamp on ;

SELECT
    {CrossJoin({[Period].[Jan]}, {[Account].[Allocation Target]})}
ON COLUMNS,
    NON EMPTY (CrossJoin([Product].Levels(0).Members, [Postcode].Levels(0).Members))
ON ROWS
FROM [EP].[ExalPlan]
WHERE ([Version].[Working], [Scenario].[Forecast], [Year].[FY12]) ;

I have to give a plug to Gary Crisci – the settings I use to turn off aliases, set decimals and precision, and the column stamp are all out of Gary’s MDX chapter in Developing Essbase Applications.  As you’ll see, formatting in MDX isn’t all that great – it all goes into MDX_Extract_Test.log along with a bunch of other stuff.  Ick, yuck, eeewwwww.

The good news is that Network54 thread I referenced above has some great suggestions for getting round all of this.  Read it through and you’ll get some ideas.

How long does all of this take?  Ah, that’s where this gets interesting -- 638.926 seconds.  Now that is interesting – almost twice as fast as report scripts and takes just 60% of the time of the DATAEXPORT calc script time.

So what do we have?

Three different ways to write out data, with an increasing performance profile:
  • Essbase report scripts
  • DATAEXPORT report script
  • MDX queries

Looking at MDX versus Essbase report scripts, I really have to say, at least in the database I am using and for the extract I am using, MDX is by far the fastest way to do this.  Awesome, right?  We have a winner.  Or do we?

Undocumented, unrecognized, and unreal

All those times, all of those techniques I wrote about above?  Yes, they’re all going to pull the same data, and yes, I have shown you a way that is twice as fast as the most common approach, at least for this database.  But you know what?  They stink.  I have, thanks to a number of people and, I might add, Oracle, a much better way to do this.  It’s so fast the first, oh, eight or nine times I ran it I thought for sure it was simply an error.  It is no error and it is freaking awesome.  

Did I figure this out myself?  Nope.

Btw, I must tell you that I did not figure this out on my own but was told of the command by my fellow ACE Director Tim Tow.  And he discovered it whilst working with Bryan Bain, one of the original AFSG (Arbor Field Services Group) Essbase consultants, when he was trying to extract the last ounce of performance out of Essbase for his flagship tool, Dodeca.  I should also mention that I tested this out as part of that Very Special Project on one of John Booth’s test servers. I am, as always, standing on the shoulders of giants.

For the love of Mike, what is it?

It is an undocumented MDX BSO-only keyword that is hinted at in this old Network54 thread (which I was part of but, no, I did not hide it from you till this time) and mentioned in the 11.2.2 Essbase readme.  Look for defect 13037253.  And you have seen it in Planning forms.  Planning?  Yes, Planning.  Have you figured it out yet?

It is:  NONEMPTYBLOCK

How does it work?

Just like this:

How long does it take?

The query (the same as the first MDX example save the keyword change) took 1.789 seconds with NONEMPTYBLOCK.  Really.  As I wrote, no, I didn’t believe it either.  Go on, try it yourself.

Are you laughing yet?  This didn’t make my day, or my week, or my month.  I think this is going to make me laugh all the way through 2013.  And considering I am stuck in an airport for hours and hours (thank you, US Airways, for eating all of my Sunday, again) after an ODTUG board meeting (the meeting was fun, the flying (or lack thereof) not so much), I’d say that is pretty strong medicine.

To put this into percent of time relative to an Essbase report script extracting the exact same data, this technique takes 0.14% as much time.  That’s right, not 14% but (I am going to say it out loud) zero-point-one-four percent.  In other words, it’s a freaking rocket.  And of course it’s Oracle that gave us this command and then mysteriously didn’t bother documenting it.  Why?  I have no idea.  It is still freaking neat.

Where does Planning fit in?

Have you ever edited a Planning (I am not going to bring up Planning just to snapshot this – trust me, it’s there and has been for a long time) form and ticked the box that tells Planning to suppress missing blocks?  Did you know that Planning builds its forms with MDX?  If not, you do now.  In any case, when you tick that box, you are using NONEMPTYBLOCK.  Remember, there is a slower (and documented) option to suppress missing in MDX – that is the NON EMPTY keyword I showed in the first MDX query example and that corresponds to Planning’s suppress missing form setting.

Does it always work like this?

With Essbase, the answer is easy:  no.  Or at least, test it, within the context of several constraints.

As I wrote above, this only makes sense within the context of BSO Essbase.  There are no blocks in ASO so this command doesn’t make sense and consequently doesn’t work (yes I tried it, no it doesn’t work).

This command also only makes sense if you are trying to extract lots of sparse at level zero of the database.  Per that Planning documentation link I gave you, if you do not have many missing blocks: “The Suppress missing blocks setting can degrade performance if few or no rows are suppressed.  Test forms before and after using this setting to determine whether performance is improved.”

And of course if you are not suppressing data then this command makes no sense at all.  Having said that, custom exports from Essbase almost always are at level zero and almost always suppress missing data, at least in my experience.  And if that is the case, NONEMPTYBLOCK is your BFF.  Btw, I would go with definition #3 on that link.  :)

Be seeing you and enjoy the hack.

19 comments:

  1. Thanks for the post. I'm curious what your report cache was for the report script and what would happen if you added <SPARSE to it.

    I'm still going to boycott MDX to do exports until they can figure out how to create a proper extract file!

    ReplyDelete
  2. Just out of interest why do you include this line?

    alter application ep clear logfile ;

    I know what it does, just wondering why it is there.

    ReplyDelete
  3. Tim,

    FWIW, I am trying it now with <SPARSE and it is *slower*.

    I may have to wait till the box is 100% free, but there is no magic bullet in the performance like MDX's NONEMPTYBLOCK.

    Regards,

    Cameron Lackpour

    ReplyDelete
  4. P. Derek,

    I used that command to keep the log file clean. You aren't seeing all of the script but it did a report script, MDX with NON EMPTY, and then MDX with NONEMPTYBLOCK and it all got confusing. I just left in in there for the blog post because I like a simple to read log file.

    Regards,

    Cameron Lackpour

    ReplyDelete
  5. Well, this method can apparently crash a server pretty hard if your member selections are a large enough cross join. After retrieving a few blocks of data very rapidly, I went on to see how long it would take for this method to export all the level 0 data in one scenario in my workforce database. The answer is apparently infinity. The server will not respond to terminal services so it may need a physical button push to recover. Cameron, are you aware of any ways of calculating resource utilization for these queries so that we can determine what is too big for a server to handle?

    ReplyDelete
    Replies
    1. Surprisingly the server came back a few hours later after the process terminated with an error, probably due to running out of memory to support the large Crossjoin. For what it is worth, here are the error messages:

      ERROR - 1200467 - Error executing formula for [REGION DEFINITION]: status code [1130203] in function [].
      ERROR - 1241101 - Unexpected Essbase error 1200467.

      Delete
  6. Todd,

    Great to hear from you. I am amazed you read this blog. :)

    Wow, pretty cool, MDX hosed the OS. Are you *sure* that's the case? I can imagine bringing an esssvr.exe down, or essbase.exe even (although that seems sort of unlikely) but the entire server to the point of require a hard boot? Anything is possible but that sounds like a bad server to me.

    I would be very curious to see what, if anything, was written to the Essbase, app, and MaxL logs.

    Okay, to your question, I do not know of any way to calculate what is too big for NONEMPTYBLOCK. I can tell you that I went after a 200 gb (really) app, and pulled out about 30,000 blocks with no problem. How many blocks do you reckon you were going against? I can fake it in AWS (not going to worry about that going down) and see if I can hit the same behavior.

    Regards,

    Cameron Lackpour

    ReplyDelete
  7. Camaren,

    I have been using NONEMPTYBLOCK keyword for a long time. This undocumented keyword is very buggy. It would "maddeningly" exclude data blocks from many of my queries.

    ReplyDelete
  8. Todd Binenstock18 March, 2013 12:23

    Hey Cameron,

    I have found a lot of useful stuff you've posted both here and elsewhere.

    Hoping to use this method as a selective export, I was probably going after a few million of the 90 million data blocks in our workforce application. I am not sure, but I believe the Crossjoins are preprocessed in memory and there must be a practical limit to how many cells can be held at one time. At least that's my guess.

    The server did come back on its own but while the query was running and for a while after killing it, I could not even remote into it.

    ReplyDelete
  9. Cameron,

    Is there a way to output Member names and not aliases using MDX?

    Thanks

    ReplyDelete
  10. LuigiO,

    I do just that in the code with the following MaxL statement:

    alter session set dml_output alias off ;

    Take a look at the MaxL -- it's all there.

    Regards,

    Cameron Lackpour

    ReplyDelete
  11. Hi Cameron,

    It would be interesting to compare the time a binary export needs with the methods you described in your post. Some times it can be very fast too.

    Manolis

    ReplyDelete
  12. Is there any way to export the YTD values using Data export in Calculations script.

    ReplyDelete
  13. Ranga,

    Re the Dynamic Time Series members, no, I know of no way to get at that via a calc script.

    Sorry.

    Regards,

    Cameron Lackpour

    ReplyDelete
  14. NONEMPTYBLOCK doesn't seem to work when transparent partitions are in place. Apologies if you said this and I missed it.

    ReplyDelete
  15. Hi Cameron,

    I'm using version 11.1.2.2 and I've tried using NON EMPTY, NONEMPTYBLOCK and NONEMPTYSUBSET in my MDX query and Essbase puts out this error for any of them:

    ERROR - 1260052 - Syntax error in input MDX query
    YBLOCK' .
    ERROR - 1241101 - Unexpected Essbase error 1260052

    Also, it doesn't seem to like the comma "," after COLUMNS as it will error out on that as well if I leave the comma there as per your example.

    Any thoughts as to why my Essbase environment doesn't seem to recognize any of the NON EMPTY, NONEMPTYBLOCK, NONEMPTYSUBSET functions?

    Thanks!

    Eric

    ReplyDelete
  16. Eric,

    I wrote the above using 11.1.2.2 so in theory it should work.

    If you use the query exactly as I wrote it against Sample.Basic, does it work?

    Regards,

    Cameron Lackpour

    ReplyDelete
  17. Hah.

    So there's an earlier comment about the 'biggest' intersection that can be brought back.

    In 11.1.2.3 they've set a limit.

    4294967296 (2 power of 32) intersections is the biggest retrieve possible.

    So there you go. Don't go bigger than that.

    Works wonders bringing data straight into Smartview using an essbase connection. Even formats it as a smartview retrieve!

    Cheers
    Pete

    ReplyDelete
  18. Though a little late to the post, I thought I would add there's an 8 bit Racer X here: http://www.stafaband.info/download/mp3/lagu_technical_difficulties_racer_x_8_bit_remix/ For those seeking that retro 80's video game sound.

    ReplyDelete