Saturday 26 January 2013

Is DELETE necessary?

A  delete is a very destructive operation for a database. An operation which requires a lot of  thinking and preparation and most of the time is irrevocable. You can get in a lot of trouble if you delete the wrong set of rows and lose all that very important data!

Maybe the DELETE command should be banned from SQL. It was probably invented when the disks were tiny and it might not be suitable for our times, especially nowadays when we talk about things like BIG DATA.

Why do we have to delete anyway? We should invalidate and age out data and never destroy it. I am one of those who would say ‘keep it’ and never lose it. Even a typo, an error while entering somebody’s name in a text field, tells us a story. The number of attempts to get it right, the number of characters typed which were wrong, the time the wrong entry took place and much more, is data. Temporal databases come to mind. Why delete it?

But, one can argue that not deleting can be dangerous, too! One can claim that by refusing to delete we might end up with a lot of old, duplicated, bad quality, uncleaned, irrelevant and untrustworthy data.
 

Maybe the worse thing that can happen to data is to refuse to delete it enough. Perhaps because of this lack of enough deletes and fear of deletes we end up with all the 'bad data' found in some  database systems today.

Whatever you choose to do, delete or not delete, one thing's for sure and that is that you should know what you are deleting or not deleting. Choosing confidently what to delete comes from understanding and knowing your data model, and the implications a delete will have on the data model and never because the data is not required for your project! 




5 comments:

Karsten Aalderks said...
This comment has been removed by a blog administrator.
Karsten Aalderks said...

A lot of stored personal data is illegal.Sexual orientation for eample. What about dead customers or closed accouts in social networks ? I think, we need delete and bulk delete commands, because a lot of data models are 'old school'. No event/actor modell, no encrypted history data partitions,...
Best regards Karsten

ChandlerDBA said...

I have worked on systems which don't need any of the data from 3 days ago. It is totally irrelevant and pretty much useless. All value has been mined and extracted by them. However, the system needs to be fast. Very fast - tens of millions of transactions every day, with a total end-to-end transaction time of less than 25ms. That's client-to-webserver-to-db-and-back. Deleting data keeps the database very small and the system very fast. Still think delete is useless? I don't.

I can see why, in some system, you might want to keep everything. But you would need to design such that "deleted" data is logically (and maybe physically) separate from non-deleted data otherwise it'll just get in the way, slow things down.

Kubilay said...

ChandlerDBA

I agree with you in most of the cases.

Information Lifecycle Management (ILM) comes to mind straight away. Where you age out old data on cheap disks elsewhere but still accessible.

I understand tuning can be hard on big sets of data, but on the other hand with In Memory databases nowadays most of tuning on I/O becomes irrelevant. You have a point, you need to look at the database from 2 angles still. Is it OLTP transaction based full ACID system? Is it DSS a data warehouse? Then you decide your architecture accordingly.

In my post I just wanted to look at it from theoretical/logical point of you. Should I delete or not? Without at all being constrained by physical limits. The only limit is hardware! :-)

Thank you for your comment.

Best Regards

Kubilay

Kubilay said...

Karsten

What do you mean data-modles are 'old school'? Can you explain a bit?

Algebra is old school but it still works.

Many thanks for your comment.

Best Regards

Kubilay