Want to know The Truth About CPM?

18 August 2014

Stupid Programming Tricks No. 19 -- TRUNCATEing DATAEXPORT

An interesting statistic

The code I am about to show you, and the solution that I came up with, took me over 15 hours of research, cursing, testing, and finally triumph.  Multiple cups of coffee were also consumed as were a few cups of proper tea.  Why do you care about what someone (such as yr. obt. svt.)  does for free or how he imbibes caffeine?  Perhaps for the amusement factor, as the solution that I came up with consists of eleven lines of code.  Eleven lines of code in 15 hours, for those of you mathematically challenged, is 0.73 lines of code per hour, or 5.5 lines of code per calendar day.  Ouch.  But that is the price for:  having a “good” idea, venturing into technology areas one knows very little about, and not giving up.

I tend to agree (if only to salve my ego) with the first point by David Veksler of the Mises Economic Blog (What, you’re not all Hayek fans?).  I quote:
A programmer spends about 10-20% of his time writing code, and most programmers write about 10-12 lines of code per day that goes into the final product, regardless of their skill level. Good programmers spend much of the other 90% thinking, researching, and experimenting to find the best design.

Does that put me in the category of a good programmer?  It’s hard to say – I do spend an awful lot of time thinking, “What am I trying to do and conceptually, how would I do it?” and then later, much later, I spend even more time trying to figure out how I turn that concept into reality.  So is that being a good programmer or just someone flailing about in areas that he doesn’t know much about?  You decide.

Why oh why oh why am I prattling on about this?

There was a thread the other day over on Network54 that discussed how to use the headers that come from a BSO DATAEXPORT statement as they are, without a doubt, pants.  I suggested that SQL output and subsequent manipulation (I should note that Adam_M went and did it in a text file but no matter, this was the inspiration and based on what he wanted to do, I think SQL might have been a better fit for transforming the data – Read The Whole Thing and make up your own mind) was the answer to Adam_M’s issue, but when I did so I had in the back of my mind the nagging reminder that, “Yeah, he’ll write it to relational, but then he’ll do it two or three or more times as he tweaks the extract and will end up with many more records than he ought.  This is a bummer (and erroneous – I would personally rather have it error out) and cannot be resolved by Essbase.”  And that’s right; Essbase will not truncate the table before it writes.  You have to do this manually and that of course means you have some kind of SQL access which in the real world, particularly in production, you will most likely not have.

The goal aka the functional spec

It would be super nice if BSO’s DATAEXPORT calc script function did a TRUNCATE to clear the table, or at least had the option of doing a TRUNCATE, before writing to a SQL target.

Two paths not taken

Just too much of a hack, even for me

One approach would be to use the @CalcMgrExecute function I wrote about (and I still owe part three of that series to you, Gentle Reader), run a MaxL script, in that MaxL script use the shell grammar, run a SQL*Plus (if Oracle) or sqlcmd (if SQL Server) script that does the TRUNCATE and then return back to the calling MaxL script, end it, and do the export as the next step in the original calc script.  At least I am pretty sure that would work but it is just too clunky for words.  I would go down this path if I wasn’t allowed to make the modifications I am about to describe but I would hate myself in the morning.

The way I would like to do it, but cannot

What occurred to me was that a SQL trigger that did a TRUNCATE before the INSERT from DATAEXPORT would do the trick.  Unfortunately, this cannot work because of the way Essbase writes to relational tables.

Would you believe…

That the default for Essbase is to do separate INSERTs for each and every line in the export?  I had heard this from My Man In California (MMIC) aka Glenn Schwartzberg but like oh so many words of wisdom he drops my way (MMIC is a very giving guy and I a forgetful one), I completely forgot.  Why does this inefficient approach to writing to relational matter?

It all goes KABOOM

It matters because the common SQL triggers of BEFORE INSERT TRIGGER (Oracle) and INSTEAD OF INSERT (SQL Server) would fire each and every time Essbase writes to the table.

Here’s what SQL Profiler showed me as the DATAEXPORT code ran:

There are 2,645 records to export out of Sample.Basic and that requires 2,645 INSERT statements.  Eeek.

It might be possible, although inefficient, to use these triggers to do the TRUNCATE and then copy the single record to yet another table, but that is an ugly approach one that I am not sure I have thought through completely as that target would need to be truncated on first write.  So maybe an INSERT trigger is not on.

Another approach

But then I again recalled of one of MMIC’s pearls of wisdom (see, Glenn, I really do listen to you) – there is an Essbase.cfg file setting that can allow batch inserts if the database supports that functionality.

DATAEXPORTENABLEBATCHINSERT

Off I went to the Tech Ref and did my requisite RTM.  Oh goody, thought I, it’s going to work.  Just read the description:  “When DATAEXPORTENABLEBATCHINSERT is set to TRUE, Essbase determines whether the relational database and the ODBC driver permit batch insert.”  But I should have known that when Essbase giveth with one hand, it taketh away with the other, because the next sentence states, “If they do, Essbase uses the batch-insert method, and, thus, performance is optimized.  Essbase determines the batch size; however, you can control the number of rows (from 2 to 1000) that are inserted at one time by using the DEXPSQLROWSIZE configuration setting.”

Oh, bugger, all this statement does is increase the number of records until that INSERT trigger fails.  I know that Sample.Basic has 2,645 rows and that means three INSERTs at the maximum row DEPSQLROWSIZE.  Bugger, again.  What to do?

Is there a way out?  Why yes there is.

INSERT triggers are out, but is there another trappable event that could allow the TRUNCATE to occur?  I looked again at SQL Profiler and saw two interesting facts.

  1. The LoginName column reflects whatever username I defined in the ODBC System DSN I used.
  2. There is a login event, at least as trapped via SQL Profiler.

The simple solution, although figuring out how to do it took, oh, forever, and it isn’t enough

If I could write a trigger that intercepts the act of EssbaseLogin (or whatever SQL username I chose) connecting to SQL Server I could have the login trigger TRUNCATE the table right then and there.  This frees me from the multiple INSERT issue quite handily.  How oh how oh how to do it was the question.

Figuring it out

SQL Server has a trigger database operation keyword called FOR LOGON (the Oracle equivalent is AFTER LOGON).  All I needed to do was to create that trigger, test for the username, do the TRUNCATE, and, because I am a paranoid nut, log: the fact that I did this, when I did it, and that the TRUNCATE succeeded.

A note for all of you SQL n00bs and experts – what you are about to see is the very first SQL trigger yr. obt. svt. ever wrote.  At least that’s the excuse I am sticking to as explanation why it took so long.  SQL n00bs – take heart, if I can do it, so can you; experts – everyone has to start somewhere, so stop the snickering at my plodding progress.

The target database

Here is what QueryTest.SampleBasicExport looks like – a simple table that sticks Product, Market, Measures, and Scenario into the rows and the 12 level zero months of Year as columns.

Approach #1

I stole the genesis for this code somewhere on the web (I meant to keep the link, but forgot to do so).   Here’s what I ended up with:

Line by line

As most of the readers of this blog are Essbase practitioners first, and SQL geeks second (or third, or fourth, etc.), I will explain each line of the code.

  1. The first line defines the trigger EssbaseDataExport – as I changed this about eleventy billion times the code shows ALTER instead of CREATE – and that this is a server-level trigger.
  2. The second line states that this is a LOGON trigger.
  3. Line three’s AS statement defines the trigger.
  4. The BEGIN statement on line four is the beginning block of the trigger logic.
  5. We finally get to the TRUNCATE logic on line five.
  6. Line six’s END marks the end of the trigger.

All I need to do is execute the code and ta-da, I now have the server-level trigger EssbaseDataExport.  You can see EssbaseDataExport in SQL Server Studio that the trigger is now there.

EssbaseDataExport  in action

Here’s what my calc script looks like:
Note that the SQL username EssbaseLogin is used.  At this point this isn’t important but it will be in just a bit.

Let’s do a count of the rows in QueryTest.SampleBasicExport.

If I run the export  I expect to see 2,645 records.

What’s the count?

When I run it a second time I expect the same number of rows as I have done a TRUNCATE just after logon but before the INSERTs actually happen:

And check the count:

DATAEXPORT ran, the LOGON trigger did the TRUNCATE, and the table still has 2,645 records.  Success boil in bag!!!

Is this really working? If I disable the trigger, there will be no truncation on run and the record count will now be 2,645 + 2,645 or 5,290 records.  I hope.
SQL Server gives me confirmation that the trigger EssbaseDataExport is disabled.

Run the DATAEXPORT one more time (Do I have to show you yet another MaxL screen shot?  Hopefully not as I am not going to do so.)

What’s the count?  Why yes it is 5,290 records.  So proof that Essbase won’t do a TRUNCATE before writing when the trigger is turned off.  Huzzah!

With a quick re-enable of the trigger, what happens when I run the DATAEXPORT calc script?
It runs yet again.
And now I have a row count of…
2,645.  I have now proved that every time there’s a login to SQL Server, I can get a TRUNCATE.  

Approach #2

But there’s a problem with this approach and it’s really quite a big one.  Every connection to SQL Server is going to result in a TRUNCATE.  Don’t believe me?  I will disconnect from SQL Server Studio, and then reconnect.

Here’s my connect to SQL Server dialog box. I click on the Connect button and…

What’s my row count on QueryTest.SampleBasicExport?

Yup, it’s zero.  As Donald Fauntleroy Duck would say, “Aw, Nuts!”  I need to be able to test for the username so that only the username that does that particular write to relational forces the clear.  And I’m going to need to be very careful about what username does that clear – EssbaseLogin could do two different kinds of connects – once to write (the TRUNCATE makes sense) and then again to read (which would be a bummer because the trigger would blow away the table contents).

So what I really need to do is create another username just used for writes, and maybe even a username just used for writing to that particular table.  I don’t want a SQL Load Rule to force that TRUNCATE to occur because that too will fire the CONNECT trigger.

Setting up a different DSN

As I am admin on my box and my SQL Server instance, this is easy peasy lemon squeezy.

Create the SQL Server username

As I am a dba on my own server, I will create the username EssbaseDataExport_Writer.

And then give EssabseDataExport_Writer db_owner access so it can do a TRUNCATE.

Creating the system DSN

Create the ODBC system DSN with that username EssbaseDataExport_Writer.

Set the default database to QueryTest:

And confirm that the username can connect:

Modifying the trigger to test for a specific username

What the trigger needs to do is test for the username EssbaseDataExport_Writer and then, and only then, perform the TRUNCATE.  All other usernames will not perform the clear of the SampleBasicExport table.

Happily there is a database operation keyword called SYSTEM_USER that will return the connection username.  Stick that SYSTEM_USER into an IF statement and then do the TRUNCATE if true and we should be good.

Here’s the trigger with the IF test for SYSTEM_USER.

Proof of the pudding part 1

If I the DataExp1 calc script using the non-tested username EssbaseLogin, SQL Server should not perform that TRUNCATE.  If there were already 2,645 records in the table SampleBasicExport, I should now have 5,290.

And so it is:

Proof of the pudding part 2

I created another calc script, this time pointing to the new DSN with the username DataExport_Writer.  In my typically unimaginative/lazy way, I named it DataExp2.

What happens now when I run the calc script?

And the row count?

Yup, I am now testing for just the username EssbaseDataExport_Writer!  Whew, what a long journey, but now when Essbase writes to SampleBasicExport using the EssbaseDataExport_Writer DSN/username, it will never double, triple, quadruple, etc. count the data.  

More cool trigger functions

It’s great that the TRUNCATE works, but wouldn’t it also be nice to have a record of each and every time the export fired and that the target table was cleared out before the write?  Why yes it would.  All I need to do is create a table with that kind of information.

Over in QueryTest, I created a table called LoginAudit with three fields:  DateTime, SystemUser, and RowsBeforeInsert.

I then altered the EssbaseDataExport trigger to do an INSERT after the TRUNCATE.  SYSDATETIME() is a date/time stamp, SYSTEM_USER() we’re already familiar with, and then I did a subquery to get the row count in SampleBasicExport which, if the TRUNCATE is successful, should always be zero.

After I apply the trigger, I then run the calc script DataExp2 and…

This result is just what I hoped for.  I could stick that INSERT statement before the IF to log everyone (this is actually a pretty common usage) and track everyone, I could extend the IF to an ELSEIF and test for other tables, in short I could do all of the cool things that SQL allows me to do.

And that is why SQL should be used for ETL work

I have gone pretty far afield and it took me 19 pages in MS Word to get to the end but a lot of that was spelling out each and every step in the process for we SQL beginners.

ETL in a Load Rule is, in my opinion, the work of the Devil Hisself because it isn’t transparent, is easy to get wrong, and has functions that cannot be undone, e.g. splitting a column.  Whatever the theological genesis of SQL is, it doesn’t suffer from any of the faults of Essbase Load Rules.  Yes, this is another Cameron rant about the Evils of Load Rules.  But I’m done.  :)

Thanks to the power of SQL, I’ve solved the issue with getting TRUNCATE access to an Essbase DATAEXPORT table which is quite often very difficult to do in a production environment through a very simple trigger.  See, SQL is awesome.

A caution, or at least a series of questions
One point about awesomeness -- I  reached out to my buddy Rich Magee, who knows SQL inside and out from a prior professional existence as a DBA, and asked him if there were any downsides to this approach.  Of course there were (life is often a combination of good and bad).  Here are his comments and they are food for thought:

"My understanding is that Logon triggers are typically used to monitor and control max number of logins, time outs and such. However, I could see no reason to not use them in your circumstance.
 
My questions (dba hat on) to you would be:
  • Why can you not simply schedule the stored proc to run?
  • What if the user logs on and off 5 times in a 5 minute period?
  • Would that not spawn unnecessary/redundant jobs all doing the same thing?
  • Could a risk be filling up the CPU or Disk with jobs/temp files that aren’t needed?"
So definitely some food for thought -- as always, you, Gentle Reader, must decide if what I hand out for free is genius or madness.  I tend to think it is the latter, not the former, but that is for you to decide.
The conclusion to the conclusion

Re the point about being a good (or bad) programmer based on approach  – I cannot say which category I fall into but I do know that I spent a lot of time figuring out what I wanted to do and then a lot more time figuring out how to do it with very little code at the end.  As before, you have to decide what class I (or you) fall into.  At least this is something you won’t have to figure out.

Be seeing you.

10 comments:

GlennS said...

Cameron, A always I love your posts, but I have a concern with your approach. Suppose I have 10 different jobs. I would need a separate login id for each. now what about 50 jobs? Gets messy. The two things I came up with and blogged about are 1. issuing the truncate in a load rule (yes it can be done but would require lots of extra load rules) 2. Use the runSQL CDF to run the truncate or a stored procedure that does the truncate. I use the runsql all the time.
As a side note, the truncate in dataexport has been requested as an enhancement, but I have no idea if or when it might get included

Cameron Lackpour said...

Glenn,

As always it is gratifying to know that you read my blog.

You are quite right in noting that each table that gets exported to must have a separate username and consequently ODBC connection. I noted that in the text and yes, if there were many of these it would be a drag. Usually there are not hundreds of these, more like a handful, but I can definitely see a dba giving me the stink eye if I asked for five usernames with TRUNCATE access. I can see getting an ugly look if I asked for one when it comes down to it.

Having said that, runsql.jar file – that is completely unsupported, right? And over five years old? That would make me worry in a production context although I take your point about it being used for real.

Lastly, I like your approach of running a stored procedure in a load rule – one could definitely set it up as a dummy load rule (I even set the single column to ignore for data load so there is no error on load). However, that only makes sense within the context of a load process or at least an Essbase-centric batch process. If I were a Planner, and I was writing data out (or even an administrator forcing a partial export) that needed to be TRUNCATEd beforehand, it wouldn’t work from Planning or Excel.

Fwiw, this works in SQL Server as a super simple stored procedure:

ALTER PROCEDURE [dbo].[TruncateQueryTestSampleBasicExport]
AS
BEGIN
TRUNCATE TABLE QueryTest.dbo.SampleBasicExport
END

The sql (T-SQL) in the load rule looks like this:
SYSDATETIME()
Exec TruncateQueryTestSampleBasicExport

Not that your last suggestion doesn’t work, but it is coming at it from a batch Essbase build/load perspective. Within that context, they’re great. For everything else, I can’t see it, particularly if the DATAEXPORT is the first step in an ETL process outside of Essbase.

What’s really, really, really needed is that TRUNCATE option on write – that would solve everything.

Regards,

Cameron Lackpour

Unknown said...

Hi Cameron,

Thanks for teaching me about triggers. I usually would solve this problem using ODI to delete the members of the SQL. And for the logging of the number of deletions, etc I'd do the same.

What troubles me is having multiple IDs as that can be problematic at clients. Let's say you want to use this solution on multiple cubes, well it looks like some more users are needed. Otherwise very cool stuff.

Also I agree that Essbase needs more SQL help for ETL. Transformations in load rules is bunk.

Finally I think the Essbase calcscript function for relational databases is missing two key components:

1. Create the dang table (or re-create/truncate it)

2. Control the order of the columns (dimensions) better. It ain't cool developing an ETL and that have someone performance tune the app re-arrange the outline order or even dense/sparse settings.

Appreciate your blog. See you at #KScope15! Hooray ODTUG

-Chris Rothermel

Cameron Lackpour said...

Chris,

Thank you for your kind words.

There are lots of ways to make BSO DATAEXPORT to SQL work. All of them are kind of painful. Glenn made the same comment about multiple usernames/ODBC connections and you are both right. I only ask how many of these would really and truly exist at a typical client. My guess would be one or two.

I was coming at it from the perspective of an interactive calc script but Glenn's batch approach and your ODI approach make sense as well given the relative contexts.

I agree that the command could stand some ehnacements -- I fear that the approaches we describe are at the far end of what people do with the command. Personally, I would *really* like to be able to tag the *Essbase* username to the export as that would allow all sorts of interesting things to happen. You could kind of, sort of, do that in an AFTER LOGON trigger but it's all a bit complicated.

And as for the triggers -- what you saw in the post was my very first one (or ones). I really wanted to know the basics of how they work and they are actually pretty awesome.

As always, thank you for reading.

Regards,

Cameron Lackpour

Pete said...

G'day All,

I'm probably in Chris's camp - once one starts requiring 'proper' ETL (whatever that means), rather than just simple data imports/exports I'd likely want to move to ODI and use the functionality there (well, I'd find somebody who knew ODI better than me and get them to do it).

I have a 'feeling' that is where oracle would likely want to push it, given the way the licensing works now in getting a limited use ODI license with a planning license.

One slightly ontopic question...but is it possible (at all) to encrypt the outbound SQL password? I don't think so, be interested if anyone managed it.

I'm not sure what that means with security with the particular example in the blog. Is it better in a control sense having multiple users accounts with only access to one table if there is a potential risk that the user/password is available in cleartext? The issue with 'system accounts' is that sometimes they end up having more access than required.

Anyways, always interesting to see new things - and notwithstanding my comments above, I may end up using exactly this function to get around a problem I'm having!

Cheers
Pete

Cameron Lackpour said...

Peter,

Thank you for your kind words. I know this sounds maudlin or sentimental in a saccharine way, but knowing that you and others get some value out of what I write means an awful lot to me.

Re this as a solution: all of the comments to this post have essentially said, “Interesting, but I’d never do it that way”. And I accept the argument that from an ETL or batch perspective, this is kind of dumb.

I wrote this post for two reasons:
1) I wanted to know how to write a SQL trigger. Yes, It’s All About Me, but this is the only way I learn as I don’t exactly have an employer providing training. Although when I was (briefly) a consulting company employee they had me writing training so there wasn’t any real gain, at least that time. :)
2) I couldn’t think how else to do this interactively, i.e., a Planner or an Admin kicks off a calc script to SQL. There just isn’t a slick way to do this from a calc script. Yes, there are other possibilities (jobs in Workspace, FDM launches, etc.) but they aren’t any less kludgy.

Re encrypting the SQL password, no, I know of no way of doing that in a calc script. It’s a curious omission, given how Oracle Planning command line utilities require an ecryped password and of course MaxL’s public/private key encryption.

Lastly (almost), it’s tough to say what a real-world dba would say when told, “I need five usernames with TRUNCATE access to five separate tables”. If the security is atomic enough, does it matter? Or do they just go crazy when they hear the request. Sometimes I have success with, “Yes, I know it’s crazy, but it’s just the way the product works.” Other times the dbas tell me to take a long walk off a short pier.

Lastly (for real), if you’re on LinkedIn, would you mind linking in to me? I’d like to email you on a few subjects. I searched for you in vain – maybe the Aus version limits US LI? Or if you can figure out some other way of trading email address so that the whole world doesn’t spam us, I’m open to that as well.

Regards,

Cameron Lackpour

Chris R said...

I'm back on this awesome trick. I'm trying to make it work in Oracle but I'm getting an error with this command:

"create or replace TRIGGER ESSBASEDATAEXPORT BEFORE INSERT ON WMRI_DATAEXPORT
BEGIN
truncate table WMRI_DATAEXPORT;
END;"

Error: PLS-0010: Encountered the symbol "TABLE" when expecting one of the following: :=.(@%; The symbol ":= was inserted before "TABLE" to continue.

I'm excited to use this. We're not using ODI and I think this solution is very elegant.

Chris R said...

I figured it out...

CREATE OR REPLACE TRIGGER ESSBASEDATALOAD

BEFORE INSERT ON WMRI_DATAEXPORT
BEGIN
EXECUTE IMMEDIATE 'TRUNCATE TABLE WMRI_DATAEXPORT';
END

Jen Guzik said...

I was all jazzed about finally getting a dataexport to a DSN source to work properly until I hit the pesky issue of it simply adding the results to the table with multiple runs. So off to Google I ran and then I was jazzed again about finding a blog post from Cameron about it.

It's disheartening that a way to do this automatically wasn't included from the outset...exporting to a table is all but useless without it, especially given the nature of Essbase data and no real way to guarantee only incremental data without a lot of tricks and business process assumptions.

Jen Guzik said...

It's not the most elegant solution, but we have a custom web interface where users can log in and launch SQL database packages or Essbase MSH scripts so users can refresh the data in their cubes on an ad hoc basis during the day if needed. The SQL packages generally pull data from the source and put it in staging tables that the Essbase dim builds / load rules pull from.

So I set up a package that deletes the records from my table, and an MSH script that executes the export calc script. We don't want users having the database schema credentials, so I set up Essbase substitution variables at the all apps / all databases level for those. With 2 clicks they can execute the process and they get an e-mail when it finishes.