An Introduction to generative AI and development tools

A brief introduction to modern development tools. Integrated development environments. Language server protocol. Generative code assistant tools.

Published

February 13th, 2025

1 Integrated Development Environment

  • An Integrated Development Environment (IDE) is a software application providing tools to facilitate programming activities.

1.1 Overview

  • There are many IDEs with varying feature sets.

Overview

  • IDEs can specialize in a particular programming language or support multiple languages.
  • Although the provided features vary across IDEs and languages, at a minimum, modern IDEs provide tools for editing, executing, and debugging code.

1.2 More than a text editor

A long time ago in a galaxy far, far away…

More than a text editor

  • Code editing was done in a text editor, generating source code files.
  • Source code was compiled with a software application called the compiler, generating machine language files.
  • Machine language files were linked among them and with other pre-existing libraries via a software application called the linker, generating an executable program.
  • If the program did not execute as expected, it was passed to a software application called the debugger to identify potential issues.

More than a text editor

  • The (development) process of editing, translating, executing, and debugging was repeated until the program was working as expected.
  • During these iterations, one had to switch between different environments and tools many times.
  • The idea of IDEs is to integrate all needed tools in a single environment to increase productivity.

1.3 Modern IDEs

  • Modern IDEs enhanced the set of provided features beyond the development cycle.
  • Code navigation.
  • Code completion.
  • Code refactoring and renaming.
  • Code formatting.
  • And many more…

Modern IDEs

  • And many more…
  • So many that programming with an IDE is an entirely different experience than programming in the same language without one or with a different one.
  • The efficiency of working with an IDE led to a high number of different IDEs.
  • IDE providers strived to include as many modern features as possible, not to stay behind competition.

Modern IDEs

  • Further, with more and more programming languages and frameworks being developed, the number of times that the same feature had to be implemented grew quadratically.

Modern IDEs

  • For \(L\) languages and \(I\) IDEs, every feature had to be implemented \(L \times I\) times.

Modern IDEs

  • Besides work duplication, not all feature implementations were identical across languages and IDEs.
  • So what if one learns to program with an IDE and then works in a company that uses a different one?

2 Language Server Protocol

A new hope

2.1 What is the LSP?

  • The Language Server Protocol (LSP) is a specification of communication rules between IDEs and language servers.
  • Originally developed by Microsoft.
  • In 2016, Microsoft partnered with Red Hat and Codenvy to develop an open standard for the LSP.
  • Today, the LSP is largely adopted by most IDEs and programming languages.

2.2 Why was it so successful?

  • By specifying the communication rules, the LSP reduces the \(L \times I\) to an \(L + I\) implementation problem.

Why was it so successful?

  • Programming languages implement language servers that understand the LSP.
  • IDEs implement clients that understand the LSP.
  • No need for feature implementation duplication.

2.3 Why should I use care?

  • Added benefit: More consistent feature implementations across IDEs.
  • Learning to program with an IDE that supports the LSP makes it easier to switch to another IDE that also supports the LSP.

2.4 LSP in R?

  • The languageserver package provides an LSP implementation for the R programming language.
  • It supports:
    • Code completion.
    • Code navigation.
    • Code formatting (via the styler package).
    • Code refactoring.
    • Code linting (via the lintr package).

3 AI coding assistants

The force awakens

3.1 Large Language Models

  • Large language models are machine learning models for natural language processing.
  • They receive as input a sequence of tokens (usually words) and output a sequence of tokens.
  • Their output is one of the most likely sequences of tokens given the input.
  • Since programs are sequences of keywords and symbols, one of the most successful applications of large language models is code generation.

Large Language Models

  • There is enormous commercial interest for companies if AI can generate safe and efficient code less costly than humans.
  • However, whether this is indeed feasible is not straightforward to answer.
  • Unlike humans, generative AI models do not generate code based on requirements but based on the statistical correlation of what is more likely to follow.
  • Asking a generative AI model to generate code for the same task multiple times results in different outputs.

Reproducibility

Reproducibility

  • Every time we query the model, it generates a solution from scratch.
  • Humans generating code for the same task a second time are more likely to work on enhancing the existing code instead of starting from scratch.

Reproducibility

  • This is, perhaps, not a big issue for small programming tasks.
  • But what if your task involves thousands of lines of code distributed across multiple files?
  • Is it feasible to examine and deal with the complexity of the generated code every time from scratch?
  • This raises doubts about the long-term maintainability of AI-generated code.

3.2 Hallucinations

  • Another issue with generative AI models is hallucinations.
  • Hallucinations in code generation manifest in a few ways.

Hallucinations

  • Hallucinations in code generation manifest in a few ways.
  • When working with self-developed or niche libraries, the model may not have seen enough examples to generate correct code.
  • It is likely to generate code that is syntactically correct, but it involves function calls and module imports that do not exist.

Hallucinations

  • Hallucinations in code generation manifest in a few ways.
  • When you have a logical error in your code, the model may generate code that is syntactically correct but reinforces or replicates the logical error.
  • This is because the model does not generate code based on requirements but based on what is more likely to follow what you have already written.

Hallucinations

var1 <- 1
var2 <- 2
var3 <- 3

first_task(var1)
second_task(var1) # logical error: var2 was supposed to be used here
third_task(var1) # potential completion suggestion

3.3 Context awareness

  • Another issue commonly encountered with coding assistants is the lack of context awareness.
  • Many implementations can correctly solve a programming task.
  • In addition, all of them can be equally efficient, safe, and maintainable.
  • However, not all implementations are equally appropriate for all contexts.

3.4 Context awareness

  • Another issue commonly encountered with coding assistants is the lack of context awareness.
  • Some implementations may fit better when paired with other parts of the code and the overarching goals of the project.
  • Nonetheless, coding assistants do not have information about the project’s goals or the rest of the codebase in all cases.
  • Eventually, the evaluation of the appropriateness of a generated solution remains a human task.

3.5 Working with AI coding assistants

  • Consequently, AI coding assistants fundamentally change the way we program.
  • Researching a solution:
    • Without: More tedious and time-consuming. Reading documentation, searching for existing solutions, implementations, and libraries.
    • With: Automatically generated.

Working with AI coding assistants

  • Consequently, AI coding assistants fundamentally change the way we program.
  • Editing code:
    • Without: More manual and slow. Omissions, logical errors, and typos can creep in.
    • With: More automated. Omissions, logical errors, and typos can still creep in.

Working with AI coding assistants

  • Consequently, AI coding assistants fundamentally change the way we program.
  • Reviewing and debugging code:
    • Without: Easier to review self-written code because the logic is known.
    • With: Harder to review. Need to understand the logic. Need to understand how used functions and modules work (documentation).

3.6 No free lunch

  • Consequently, AI coding assistants fundamentally change the way we program.
  • Overall, working with AI coding assistants removes responsibilities from the research stage but creates new ones in the reviewing stage.