A Tale of Three String Parsing Algorithms

Rachel Reilly
2 min readJan 11, 2021

--

The Sliced landing page

For my first full-stack capstone project at Thinkful’s Software Engineering program, I created Sliced, a recipe app that dynamically scales recipes by increments of ¼. It does this by parsing user text input and separating the amount and unit from the ingredient’s name. This is the story of how I created a robust string parser with O(n) time complexity.

The original string parser checked for numbers by turning a string into a number with the JavaScript Number() method and checking if that was still of type number. It checked for units by looping through an array of valid types, returning the rest of the line as the ingredient name.

This method had a few problems. For one, it didn’t consider decimals and fractions as valid numbers. It also used the JavaScript filter() method to check for valid units, which meant I was looping through the whole array of valid units for every word that was not processed as a number. Even though there were only a dozen valid types, this process has O(n) time complexity, so doing this computation for almost every word was costly.

The third problem was that the string parser differentiated between different ingredients by line breaks, which forced the user to put effort into formatting their recipe. After watching several people try to interact with the app, I realized that, while I had thought splitting ingredients on line-breaks was intuitive, my users didn’t.

The second version of my string parser fixed the first problem, permitting multi-digit numbers, fractions, and decimals. I refactored my string parser to use regular expressions that picked up different number formats by picking up digits and wildcard characters. This worked well for picking up numbers, but it didn’t eliminate any of the unnecessary complexity.

After learning more about algorithms, I decided to refactor my string parser further. I figured that I could parse the entire string with just one pass through it, checking against several conditions. First, I removed the array methods that split ingredients on line breaks and returned an array of strings. Formatting the data this way required looping through the string twice before the data even made it to a string parser. Then I started testing my regular expression and realized that several of the cases I had were unnecessary. Finally, I switched how I checked for valid units. Instead of looping through an array, I imported a JSON object with keys for all of the valid types and normalized the units to make them compatible with the newly-added ENUM in the database.

--

--

Rachel Reilly

Software Engineer specializing in React but passionate about the future of digital interaction beyond visual user interfaces