Artículo que describe el proceso de centralizar los datos. Aunque creo que más lo más significativo es que lo hagan, con algo así no se escapa ni el tato, el gran hermano un enanito comparado a este sistema nuevo.
Greece's Public Sector merges its data, how it was done Fuzz Box: Greece's Public Sector merges its data, how it was done
Kallikratis is the codename of the largest project ever conceived in the later years (2010-2011) of Greece's public sector digitalisation / computerisation effort and was founded by the Ministry of Interior and Decentralization.
The goal? To migrate different datasets spread among different applications and databases withing the public sector's infrastructure.
That accounts to million rows of citizens' and public organizations' data / datasets distributed among different applications, structures, databases etc. having to be migrated in a homogenous, valid, normalized and commonly accepted data structure. In other words an ETL - Extract Transform Load process was required.
imho the talks and negotations to define the details of the final homogenous structure for each dataset was the hardest part, requiring cooperative thinking and strategizing among different tech companies and the ministry.
Datasets were categorized by domain: for example citizens' demographic data, public sector's and prefectures' economical/budgeting data, document management systems' data, and the list goes on.
The final process took place in the beggining of the year and lasted about a month, the merge happened in the first five days with corrections following until the end of January.
The project involved 2 vital factors:
1. had to be fast - ( execution, implementation,debugging, ability to change how things work on site wihtout the need to alter code / or recompile enabling tech support to operate)
2. and opensource
The plan was to extract and transform the data to a common XML file, validate it against the corresponding XSD and finally load it into the new environment. We ended up with a set of numerous transform processes and one generic load process that accepted as input the transformed xml files.
Codename of the process was Datapair, give me thumbs up for the original name - sarcasm enues
Various apis / platforms were investigated with some of them being:
Pentaho Data Integration , (PDI)
Clover ETL
Talend Open Studio
For reasons that i would not like to present here and now, we chose the pentaho solution.
(briefly, CloverETL came at a cost for advanced features, while Talend's performance was found subpar
at least at that time)
In the end everything worked better than expected and i am especially happy about our tech support department beeing able to learn and finally operate the Pentaho Data Integration suite. That was a great relief enabling more people to take part in the process.
Someone could consider this,(and i mean the project as a whole from conceptualization to implementation), a default practice or a standardized ,(sic), solution but for Greece's Software industry was well...groundbreaking.
Expect more articles to come, further analysing Pentaho's components as a minimal tribute to the platform that did the job for us.
Extra thank you for reaching the end of this post
__________________
La vivienda siempre baja, vende ahora que luego no podrás, al principio cuesta luego te jode la vida, alquilar es ahorrar el dinero
Mi aplicación DEFCON para seguir las vicisitudes de nuestra deuda.