Share Your Data Horror Stories

Arthur · December 11, 2019, 3:56pm

As data enthusiasts, we take joy in finding insights through analysis of data and numbers. But we all know that not all data is the same! Sometimes life hits us with bad databases that make everything so much more difficult and we’re left scratching our heads as we figure out a solution to clean up the mess.

To open up some light and fun discussions with all of our friends here in The Cube, we want to hear some of the stories you have about working with bad data. Have you run into situations where poor data has led to headaches? Without sharing sensitive information, tell a story about some of the annoyances you’ve run into in the past!

james.davis · December 11, 2019, 5:26pm

previous company I worked for we moved data from Postgres to MS SQL.
This Db had a comments section and they decided to store the comments not as the text that you typed but as the Pickled transport data instead (its a python thing).
We had to go in and have an API written in the old web (Python) code to send it the pickled row and have it send back the plain text.

That added a lot of time in the transfer of the data from postgres to SQL Server., that should not have been there.

ravi.bansal · December 11, 2019, 7:33pm

Throughout my experience, I have come to deal with databases majorly plagued with data redundancy and poor indexing issues primarily because of revolving door types team management. The new guy came in, created newer tables, messed up indexing…the best practice is to have concise data structure guide and be followed as Bible buy the whole team.

jeff · December 11, 2019, 8:16pm

I once worked with a company that used excel…

mohammad.qasim · December 11, 2019, 10:15pm

I did complete project for a client totaly depending on excel tens of thousands of records. It was a disaster joining data from many excel sheets to build one data cube with alot of data manipulations

david.glickman · December 12, 2019, 9:51am

We once had a client who changed the codes each month. So one month ‘2’ was ‘Japan’, the next month ‘2’ was ‘USA’ etc.
Total nightmare, but they helpfully produce a datamap which was correct for each month, but was in a different order each time

david.glickman · December 12, 2019, 3:02pm

How about a database where users could manually type in the country field?

We had many countries including the following

US
USA
United States
America
Washington (that great country)
Untied States
…

luis.silva · December 13, 2019, 9:45am

We on our company, had some major issues regarding data quality, a database model that has not evolved as the business needs and end up in a poor model, where there are fields (columns names) that are used to store all sort of information, referential problems as the model doesn’t implement referential integrity.

shachar.hallas · December 13, 2019, 11:20am

I used DWH db with many outliers and nulls are very common

abdi.elmi.1 · December 16, 2019, 4:32pm

Much like David’s story, addresses were manually typed as one entry, so no consistency and validation on the front-end was poorly handled, so the you can imagine how frustrating that would be to work with!

jian · December 17, 2019, 8:07pm

One of the BI products we support is fed by various Excel spreadsheets. Even though all Excel templates are standardized, the data warehouse team still identifies all sorts of data quality issues (missing data, wrong values and/or data types, etc.) during almost every data file submission cycle. Currently it takes us a lot of validation and debugging before we are able to start data loading.

matt.cooper · December 20, 2019, 12:28pm

I’m still smashing my head against a normalization issue. The product itself has a Type->SubType->Item hierarchy, but in the database they don’t have any direct relationships, instead connecting to a 3rd table via a 4th table for reasons that I’ve yet to figure out. Since Dundas can’t make heads or tails of the relationship (which is 100% fair because I can’t, either), I’m in the process of building 3 custom views directly in SQL that will build the relationships for me. Not all DB designers are created equal, apparently.

james.davis · December 20, 2019, 7:21pm

I have that issue too with Costpoint DB.