CPS251 Android Development by Scott Shaper

Regular Expressions

Introduction

Regular expressions, or "regex" for short, are like a special language for finding and matching patterns in text. Think of them as a super-powered search tool that can help you validate user input in your apps. Let's learn how to use regex to make your input validation even better!

Understanding Regex Basics

When to Use Regex

  • Validating email addresses
  • Checking phone number formats
  • Enforcing password requirements
  • Validating dates and times
  • Checking ZIP codes and other formatted input

Quick Reference: Common Regex Components

Component What It Does When to Use It
Position Markers ^ (start) and $ (end) When you need to match the entire string
Character Classes \d (digits), \w (word chars) When matching specific types of characters
Quantifiers *, +, ?, {n} When you need to match multiple occurrences
Character Sets [a-z], [0-9], [^abc] When matching specific ranges of characters

Regex Components Reference

Position Markers
Symbol What It Does Example
^ Start of string ^Hello matches "Hello" at start
$ End of string world$ matches "world" at end
\b Word boundary \bcat\b matches "cat" as whole word
\B Non-word boundary \Bcat\B matches "cat" inside words
Character Classes
Symbol What It Matches Example
. Any single character (except newline) c.t matches "cat", "cut", "c@t"
\d Any digit (0-9) \d{3} matches "123", "456"
\D Any non-digit \D+ matches "abc", "!@#"
\w Word character (letter, number, underscore) \w+ matches "hello123"
\W Non-word character \W+ matches "!@#$%"
\s Whitespace (space, tab, newline) \s+ matches " " or "\t\n"
\S Non-whitespace \S+ matches "hello"
Quantifiers
Symbol What It Does Example
* Zero or more a* matches "", "a", "aa"
+ One or more a+ matches "a", "aa"
? Zero or one a? matches "", "a"
{n} Exactly n times a{3} matches "aaa"
{n,} n or more times a{2,} matches "aa", "aaa"
{n,m} Between n and m times a{2,4} matches "aa", "aaa", "aaaa"
Character Sets
Pattern What It Matches Example
[abc] Any single character from set [abc] matches "a", "b", "c"
[^abc] Any character not in set [^abc] matches "d", "1", "@"
[a-z] Any lowercase letter [a-z]+ matches "hello"
[A-Z] Any uppercase letter [A-Z]+ matches "HELLO"
[0-9] Any digit [0-9]+ matches "123"
[a-zA-Z] Any letter [a-zA-Z]+ matches "Hello"
Special Characters
Symbol What It Matches Example
\. Literal dot \. matches "."
\+ Literal plus \+ matches "+"
\* Literal asterisk \* matches "*"
\? Literal question mark \? matches "?"
\( Literal opening parenthesis \( matches "("
\) Literal closing parenthesis \) matches ")"

Email Pattern

The email pattern ^[A-Za-z0-9+_.-]+@(.+)$ breaks down as follows:

  • ^ - Start of the text
  • [A-Za-z0-9+_.-] - Any letter (both cases), number, or special characters +, _, ., or -
  • + - One or more of the previous characters
  • @ - The @ symbol
  • (.+) - One or more of any character (the domain part)
  • $ - End of the text

Phone Number Pattern

The phone pattern ^\\d{3}-\\d{3}-\\d{4}$ breaks down as follows:

  • ^ - Start of the text
  • \\d{3} - Exactly three digits
  • - - A hyphen
  • \\d{3} - Exactly three more digits
  • - - Another hyphen
  • \\d{4} - Exactly four digits
  • $ - End of the text

Phone Number Patterns with Different Separators

You can create more flexible phone number patterns that accept different separators:

  • Hyphen or Forward Slash: ^\\d{3}[-/]\\d{3}[-/]\\d{4}$
    • Accepts: 123-456-7890 or 123/456/7890
    • Uses character class [-/] to match either separator
  • Optional Separators: ^\\d{3}[-/]?\\d{3}[-/]?\\d{4}$
    • Accepts: 123-456-7890, 123/456/7890, or 1234567890
    • Uses ? to make separators optional
  • Multiple Separator Options: ^\\d{3}[-/\\s]\\d{3}[-/\\s]\\d{4}$
    • Accepts: 123-456-7890, 123/456/7890, or 123 456 7890
    • Uses [-/\\s] to match hyphen, forward slash, or space

Note: The \\s matches any whitespace character (space, tab, newline).

Password Pattern

The password pattern ^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).{8,}$ breaks down as follows:

  • ^ - Start of the text
  • (?=.*[0-9]) - Must contain at least one number (positive lookahead)
  • (?=.*[a-z]) - Must contain at least one lowercase letter (positive lookahead)
  • (?=.*[A-Z]) - Must contain at least one uppercase letter (positive lookahead)
  • .{8,} - Must be at least 8 characters long
  • $ - End of the text

ZIP Code Pattern

The ZIP code pattern ^\d{5}(-\d{4})?$ breaks down as follows:

  • ^ - Start of the text
  • \d{5} - Exactly five digits
  • (-\d{4})? - Optional four digits
  • $ - End of the text

Time Pattern

The time pattern ^\d{2}:\d{2}$ breaks down as follows:

  • ^ - Start of the text
  • \d{2} - Exactly two digits
  • : - The colon
  • \d{2} - Exactly two digits
  • $ - End of the text

Practical Examples

Email Validation in Compose

@Composable
fun EmailInput() {
    var email by remember { mutableStateOf("") }
    var isEmailValid by remember { mutableStateOf(true) }
    
    // Create the regex pattern
    val emailRegex = "^[A-Za-z0-9+_.-]+@(.+)$".toRegex()

    OutlinedTextField(
        value = email,
        onValueChange = { 
            email = it
            // Check if the email matches our pattern
            isEmailValid = it.isEmpty() || it.matches(emailRegex)
        },
        label = { Text("Email") },
        isError = !isEmailValid && email.isNotEmpty(),
        supportingText = {
            if (!isEmailValid && email.isNotEmpty()) {
                Text("Please enter a valid email address")
            }
        }
    )
}

What This Example Is Doing

EmailInput keeps email and isEmailValid in state. It builds a regex from the pattern ^[A-Za-z0-9+_.-]+@(.+)$ (local part with letters, numbers, +_.-, then @, then domain). In onValueChange it updates email and sets isEmailValid = it.isEmpty() || it.matches(emailRegex), so an empty field is not shown as invalid. When the text is non-empty and doesn’t match the pattern, isError is true and the supporting text shows "Please enter a valid email address." So the user gets live email-format validation.

Phone Number Validation in Compose

@Composable
fun PhoneInput() {
    var phone by remember { mutableStateOf("") }
    var isPhoneValid by remember { mutableStateOf(true) }
    
    // Create regex pattern for phone numbers with hyphens or forward slashes
    val phoneRegex = "^\\d{3}[-/]\\d{3}[-/]\\d{4}$".toRegex()

    OutlinedTextField(
        value = phone,
        onValueChange = { 
            phone = it
            // Check if the phone matches our pattern
            isPhoneValid = it.isEmpty() || it.matches(phoneRegex)
        },
        label = { Text("Phone Number") },
        isError = !isPhoneValid && phone.isNotEmpty(),
        supportingText = {
            if (!isPhoneValid && phone.isNotEmpty()) {
                Text("Please enter a valid phone number (123-456-7890 or 123/456/7890)")
            }
        },
        placeholder = { Text("123-456-7890") }
    )
}

What This Example Is Doing

PhoneInput keeps phone and isPhoneValid in state. The regex ^\\d{3}[-/]\\d{3}[-/]\\d{4}$ expects three digits, a hyphen or slash, three digits, a hyphen or slash, and four digits (e.g. 123-456-7890 or 123/456/7890). On each change, isPhoneValid is set to true when the field is empty or matches the pattern. Invalid non-empty input shows an error and the message "Please enter a valid phone number (123-456-7890 or 123/456/7890)." The placeholder hints at the expected format.

ZIP Code Validation in Compose

@Composable
fun ZipCodeInput() {
    var zipCode by remember { mutableStateOf("") }
    var isZipValid by remember { mutableStateOf(true) }
    
    // Create regex pattern for 5-digit ZIP codes
    val zipRegex = "^\\d{5}$".toRegex()

    OutlinedTextField(
        value = zipCode,
        onValueChange = { 
            zipCode = it
            // Check if the ZIP code matches our pattern
            isZipValid = it.isEmpty() || it.matches(zipRegex)
        },
        label = { Text("ZIP Code") },
        isError = !isZipValid && zipCode.isNotEmpty(),
        supportingText = {
            if (!isZipValid && zipCode.isNotEmpty()) {
                Text("Please enter a valid 5-digit ZIP code")
            }
        },
        placeholder = { Text("12345") }
    )
}

What This Example Is Doing

ZipCodeInput keeps zipCode and isZipValid in state. The regex ^\\d{5}$ matches exactly five digits. When the user types, isZipValid is true only if the field is empty or matches that pattern. Non-empty invalid input (wrong length or non-digits) shows the error state and "Please enter a valid 5-digit ZIP code." The placeholder "12345" shows the expected format.

Tips for Success

  • Start with simple patterns and add complexity gradually
  • Test your patterns with various input cases
  • Use online regex testers to verify your patterns
  • Break complex patterns into smaller, understandable parts
  • Document your patterns with clear comments

Common Mistakes to Avoid

  • Forgetting to escape special characters
  • Not considering edge cases in your patterns
  • Making patterns too complex to maintain
  • Not testing with invalid input
  • Ignoring performance with complex patterns

Best Practices

  • Keep patterns as simple as possible
  • Use meaningful variable names for your patterns
  • Add comments explaining complex patterns
  • Consider using predefined patterns for common cases
  • Test patterns with both valid and invalid input