Internship goal
Augment Dataiku data preparation by improving features on data records
Detailed description
Today, Dataiku boasts a robust data preparation framework that functions admirably to process a vast amount of data, helping users to have clean databases with the right data (and only the right data) inside them. However, we believe that with your help, we can take it a step further!
In a world where databases can be filled by real humans, data is not always clean. Errors can happen, typos can be made, and sometimes, you want to merge two database tables containing the same information, but not quite in the same format. "Dataiku", "dataiku", "data
iku" refer to the same company, but will be considered different entries in your database.
The goal of this internship is to improve the capabilities of our "distinct" processor to support fuzzy matching (aka: matching data that looks almost the same). You will participate to help our customers clean up their database, detect duplicated information and reduce them to a single line.
Why Engineering at Dataiku?
Dataiku's on-premise, cloud, or SaaS-deployed platform connects many data science technologies, and our technology stack reflects our commitment to quality and innovation. We integrate the best of data and AI tech, selecting tools that truly enhance our product. From the latest LLMs to our dedication to open source communities, you'll work with a dynamic range of technologies and contribute to the collective knowledge of global tech innovators. You can find out even more about working in Engineering at Dataiku by taking a look here.
How you'll make an impact
Get familiarwith Dataiku and its data preparation recipes as well as database schemas.
Participate todesigna new component able to detect duplicate data
Developthe User Interface that helps the user understand the clusters of data
Helpour users to reduce their data overload!
Stack
Python and Java for the backend side
JavaScript/Angular for the frontend part
#LI-Onsite