Managing Daily Duties for Data Scientists: ChatGPT's Capabilities in Five Key Areas
In this article, we delve into the world of data science automation, focusing on a case study that analyses failed ride orders from Gett. The author, Nate Rosidi, a data scientist and adjunct professor, discusses the use of AI to handle routine tasks, freeing up valuable time for more complex analyses.
The data project at hand requires examining key matching metrics to understand why some customers did not successfully secure a ride. To achieve this, we'll be utilising ChatGPT and the Gemini CLI, a command-line interface that streamlines various data science tasks.
First, we install and set up Gemini CLI using the command . This tool provides the necessary infrastructure to integrate with ChatGPT-like models. Once installed, automatic authentication is triggered by running any Gemini CLI command, prompting for your Google account login and handling tokens for accessing models like Gemini-2.5-pro.
With Gemini CLI in place, we prepare our data. Ensure your dataset is accessible, whether in CSV format or via a database connection. You can use prompts or commands in Gemini CLI to load or ingest data within the environment.
Next, we prompt Gemini CLI to perform automated tasks with a single command. This includes conducting exploratory data analysis, cleaning the dataset, generating automatic visualizations, preparing the dataset for machine learning, and applying appropriate machine learning models after user-selected target variables. An example prompt could be: "Build a Streamlit app that automates EDA, data cleaning, creates automatic visualizations, prepares the dataset for machine learning, and applies a machine learning model based on target variables selected by the user."
Once the prompt is executed with Gemini CLI, the generated output (e.g., Python code for a Streamlit app) can be reviewed and modified as needed. The output covers cleaning rules, exploration parameters, visual styles, or model choice.
Finally, deploy the automated workflow for regular use and iterate by refining prompts or adding new requirements for model retraining or new data sources.
In this Gett case study, ChatGPT demonstrates its ability to handle missing values in the data. It converts the date column, drops invalid orders, and imputes missing values to the m_order_eta. Furthermore, ChatGPT is used to apply a machine learning model to the dataset, with the prompt structure for applying the model provided. However, the specific model used in this instance is not specified.
The Streamlit app built using Gemini CLI automates the tasks performed by ChatGPT, streamlining the data science pipeline from start to finish. This approach leverages large language model techniques and Gemini CLI orchestration without requiring manual step-by-step coding for every task, making data science more efficient and accessible.
Read also:
- Exploring the Advantages of Outdoor Group Meditation for Enhancing the Mind-Body Union
- Hidden beneath the appealing aesthetic of Consume Me's artwork lies a more ominous nature
- Reflection: Ponder the Fate of City Pigeons
- Sustainable Seafood Consumption: An Examination of Environmental Impact: A Guide for Seafood Lovers