String Manipulation
Strings are one of the most commonly used data types in programming. In Python, strings are sequences of characters enclosed in quotes (single, double, or triple quotes). String manipulation involves various operations to modify, analyze, or extract information from strings.
Creating Strings
There are several ways to create strings in Python:
# Single quotes
name = 'John'
# Double quotes
message = "Hello, World!"
# Triple quotes (for multi-line strings)
description = """This is a multi-line
string that spans
multiple lines."""
# Empty string
empty_string = ""
Both single and double quotes work the same way, but using double quotes allows you to include single quotes within the string without escaping them, and vice versa:
# Single quotes inside double quotes
message = "Don't worry about apostrophes"
# Double quotes inside single quotes
code = 'The variable is called "counter"'
String Concatenation
You can combine strings using the +
operator:
first_name = "John"
last_name = "Doe"
full_name = first_name + " " + last_name
print(full_name) # Output: John Doe
You can also use the +=
operator to append to a string:
greeting = "Hello"
greeting += ", World!"
print(greeting) # Output: Hello, World!
For more complex string building, especially when combining different data types, it’s often better to use string formatting methods (covered later) rather than concatenation.
String Repetition
The *
operator repeats a string a specified number of times:
separator = "-" * 20
print(separator) # Output: --------------------
word = "Python "
repeated = word * 3
print(repeated) # Output: Python Python Python
String Indexing
Strings in Python are sequences, and each character has an index. Indexing starts at 0 for the first character:
message = "Hello, World!"
# Access individual characters
first_char = message[0] # 'H'
sixth_char = message[5] # ','
# Negative indexing (counting from the end)
last_char = message[-1] # '!'
second_last = message[-2] # 'd'
print(f"First character: {first_char}")
print(f"Last character: {last_char}")
Important: Strings in Python are immutable, which means you cannot change individual characters directly:
message = "Hello"
# This will cause an error:
# message[0] = "h" # TypeError: 'str' object does not support item assignment
# Instead, create a new string
message = "h" + message[1:]
print(message) # Output: hello
String Slicing
Slicing allows you to extract a portion of a string:
message = "Hello, World!"
# Syntax: string[start:end:step]
# The slice includes start but excludes end
# Extract "Hello"
hello = message[0:5] # or simply message[:5]
# Extract "World"
world = message[7:12]
# Extract every second character
every_second = message[::2] # "Hlo ol!"
# Extract the last 5 characters
last_five = message[-5:] # "orld!"
# Reverse a string
reversed_msg = message[::-1] # "!dlroW ,olleH"
print(f"Hello part: {hello}")
print(f"World part: {world}")
print(f"Every second character: {every_second}")
print(f"Last five characters: {last_five}")
print(f"Reversed message: {reversed_msg}")
String Methods
Python provides many built-in methods for string manipulation:
Case Conversion Methods
message = "Hello, World!"
# Convert to uppercase
upper_case = message.upper()
print(upper_case) # HELLO, WORLD!
# Convert to lowercase
lower_case = message.lower()
print(lower_case) # hello, world!
# Convert first character of each word to uppercase
title_case = "welcome to python".title()
print(title_case) # Welcome To Python
# Capitalize only the first character of the string
capitalized = "welcome to python".capitalize()
print(capitalized) # Welcome to python
# Swap case (uppercase becomes lowercase and vice versa)
swapped = "Hello, World!".swapcase()
print(swapped) # hELLO, wORLD!
Searching Methods
message = "Python is a powerful programming language"
# Check if a string starts with a specific prefix
starts_with_python = message.startswith("Python")
print(starts_with_python) # True
# Check if a string ends with a specific suffix
ends_with_language = message.endswith("language")
print(ends_with_language) # True
# Find the position of a substring (returns -1 if not found)
position = message.find("powerful")
print(position) # 11
# Count occurrences of a substring
count = message.count("p")
print(count) # 3 (two in "Python" and one in "powerful")
# Check if a string contains only alphabetic characters
is_alpha = "Python".isalpha()
print(is_alpha) # True
# Check if a string contains only digits
is_digit = "12345".isdigit()
print(is_digit) # True
# Check if a string is alphanumeric
is_alnum = "Python3".isalnum()
print(is_alnum) # True
# Check if all characters are whitespace
is_space = " ".isspace()
print(is_space) # True
Transformation Methods
# Replace parts of a string
original = "Python is cool, Python is powerful"
replaced = original.replace("Python", "Java")
print(replaced) # Java is cool, Java is powerful
# Replace with a limit
replaced_once = original.replace("Python", "Java", 1)
print(replaced_once) # Java is cool, Python is powerful
# Strip whitespace from the beginning and end
text = " Hello, World! "
stripped = text.strip()
print(stripped) # "Hello, World!"
# Strip only from the left
left_stripped = text.lstrip()
print(left_stripped) # "Hello, World! "
# Strip only from the right
right_stripped = text.rstrip()
print(right_stripped) # " Hello, World!"
# Strip specific characters (not just whitespace)
text_with_symbols = "###Hello, World!###"
stripped_symbols = text_with_symbols.strip("#")
print(stripped_symbols) # "Hello, World!"
# Pad a string to a fixed length
padded = "Hello".ljust(10, "*")
print(padded) # "Hello*****"
right_padded = "Hello".rjust(10, "*")
print(right_padded) # "*****Hello"
center_padded = "Hello".center(10, "*")
print(center_padded) # "**Hello***"
Splitting and Joining
# Split a string into a list based on a delimiter
message = "apple,banana,cherry,date"
fruits = message.split(",")
print(fruits) # ['apple', 'banana', 'cherry', 'date']
# Split with a maximum number of splits
first_two = message.split(",", 2)
print(first_two) # ['apple', 'banana', 'cherry,date']
# Split by whitespace (default behavior of split())
sentence = "Python is a programming language"
words = sentence.split()
print(words) # ['Python', 'is', 'a', 'programming', 'language']
# Split by lines
multiline = """Line 1
Line 2
Line 3"""
lines = multiline.splitlines()
print(lines) # ['Line 1', 'Line 2', 'Line 3']
# Join a list of strings into a single string
joined = ", ".join(fruits)
print(joined) # 'apple, banana, cherry, date'
# Join with empty string
no_spaces = "".join(words)
print(no_spaces) # 'Pythonisaprogramminglanguage'
String Formatting
Python provides several ways to format strings:
f-strings (Python 3.6+)
name = "Alice"
age = 30
height = 5.8
# Basic formatting
greeting = f"Hello, my name is {name} and I am {age} years old."
print(greeting)
# Format specifications
height_formatted = f"My height is {height:.1f} feet."
print(height_formatted) # My height is 5.8 feet.
# Expressions in f-strings
print(f"In five years, I'll be {age + 5} years old.") # In five years, I'll be 35 years old.
# Alignment and padding
for num in range(1, 11):
print(f"{num:2d} squared is {num**2:3d}")
# Output:
# 1 squared is 1
# 2 squared is 4
# 3 squared is 9
# ...
# 10 squared is 100
The format()
Method
# Basic formatting
greeting = "Hello, my name is {} and I am {} years old.".format(name, age)
print(greeting)
# Positional arguments
template = "The order is: {0}, {1}, {2}."
print(template.format("first", "second", "third")) # The order is: first, second, third.
# Reuse positional arguments
template = "The repeated order is: {0}, {1}, {0}."
print(template.format("first", "second")) # The repeated order is: first, second, first.
# Named arguments
info = "Name: {name}, Age: {age}, Height: {height}m".format(name="Bob", age=25, height=1.85)
print(info) # Name: Bob, Age: 25, Height: 1.85m
# Format specifications
pi = 3.14159265359
print("Pi is approximately {:.2f}".format(pi)) # Pi is approximately 3.14
# Alignment
for i in range(1, 11):
print("Number: {:<2}, Square: {:<3}, Cube: {:<4}".format(i, i**2, i**3))
# Number: 1 , Square: 1 , Cube: 1
# Number: 2 , Square: 4 , Cube: 8
# ...
The %
Operator (older style)
# Basic formatting
greeting = "Hello, my name is %s and I am %d years old." % (name, age)
print(greeting)
# Format specifiers
pi = 3.14159
print("Pi is approximately %.2f" % pi) # Pi is approximately 3.14
# Multiple values
print("Name: %s, Age: %d, Height: %.1f" % (name, age, height))
Note:
While the %
operator is still supported, f-strings and the format()
method are generally preferred in modern Python code due to improved readability and flexibility.
String Interpolation with Variables
Python 3.6+ introduced a simpler form of string formatting using f-strings, which directly interpolate variables:
name = "Charlie"
age = 40
print(f"{name} is {age} years old.") # Charlie is 40 years old.
Working with Unicode and Special Characters
Python 3 strings are Unicode by default, which means they can contain characters from various languages and special symbols:
# Unicode characters
unicode_string = "こんにちは" # Japanese for "Hello"
print(unicode_string)
# Unicode escape sequences
heart_symbol = "\u2764" # Unicode code point for heart
print(heart_symbol) # ❤
# Special escape sequences
text_with_newlines = "First line\nSecond line"
print(text_with_newlines)
# Output:
# First line
# Second line
text_with_tabs = "Name\tAge\tCity"
print(text_with_tabs)
# Output:
# Name Age City
# Raw strings (ignore escape characters)
raw_string = r"C:\Users\John\Documents"
print(raw_string) # C:\Users\John\Documents
Practical String Manipulation Examples
Example 1: Password Validator
def validate_password(password):
"""
Validate that a password meets the following criteria:
- At least 8 characters long
- Contains at least one uppercase letter
- Contains at least one lowercase letter
- Contains at least one digit
- Contains at least one special character (!@#$%^&*()_+)
Returns a list of validation errors or an empty list if valid.
"""
errors = []
# Check length
if len(password) < 8:
errors.append("Password must be at least 8 characters long")
# Check for uppercase letter
if not any(char.isupper() for char in password):
errors.append("Password must contain at least one uppercase letter")
# Check for lowercase letter
if not any(char.islower() for char in password):
errors.append("Password must contain at least one lowercase letter")
# Check for digit
if not any(char.isdigit() for char in password):
errors.append("Password must contain at least one digit")
# Check for special character
special_chars = "!@#$%^&*()_+"
if not any(char in special_chars for char in password):
errors.append("Password must contain at least one special character (!@#$%^&*()_+)")
return errors
# Test the validator
test_passwords = [
"abc123", # Too short
"ALLUPPERCASE123!", # No lowercase
"alllowercase123!", # No uppercase
"ABCDEabcde!", # No digits
"ABCDEabcde123", # No special chars
"ABCabc123!", # Valid
]
for password in test_passwords:
errors = validate_password(password)
if errors:
print(f"Password '{password}' is invalid:")
for error in errors:
print(f"- {error}")
else:
print(f"Password '{password}' is valid")
print()
Example 2: Text Analyzer
def analyze_text(text):
"""
Analyze a text and return statistics about it.
"""
# Prepare the text: convert to lowercase and remove punctuation
import string
text = text.lower()
for punctuation in string.punctuation:
text = text.replace(punctuation, "")
# Split into words
words = text.split()
# Count the words
word_count = len(words)
# Count unique words
unique_words = set(words)
unique_word_count = len(unique_words)
# Find most common words
from collections import Counter
word_counter = Counter(words)
most_common = word_counter.most_common(5)
# Calculate average word length
total_length = sum(len(word) for word in words)
avg_word_length = total_length / word_count if word_count > 0 else 0
# Analyze sentence structure
sentences = text.replace("!", ".").replace("?", ".").split(".")
sentences = [s.strip() for s in sentences if s.strip()]
sentence_count = len(sentences)
avg_sentence_length = word_count / sentence_count if sentence_count > 0 else 0
# Return the analysis
return {
"word_count": word_count,
"unique_word_count": unique_word_count,
"vocabulary_diversity": unique_word_count / word_count if word_count > 0 else 0,
"avg_word_length": avg_word_length,
"sentence_count": sentence_count,
"avg_sentence_length": avg_sentence_length,
"most_common_words": most_common
}
# Test the analyzer
sample_text = """
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured, object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
"""
analysis = analyze_text(sample_text)
print("Text Analysis Results:")
print(f"Word count: {analysis['word_count']}")
print(f"Unique word count: {analysis['unique_word_count']}")
print(f"Vocabulary diversity: {analysis['vocabulary_diversity']:.2f}")
print(f"Average word length: {analysis['avg_word_length']:.2f} characters")
print(f"Sentence count: {analysis['sentence_count']}")
print(f"Average sentence length: {analysis['avg_sentence_length']:.2f} words")
print("Most common words:")
for word, count in analysis['most_common_words']:
print(f"- '{word}': {count} occurrences")
Example 3: Simple Template Engine
def render_template(template, variables):
"""
A simple template engine that replaces {{variable}} in the template
with the corresponding value from the variables dictionary.
"""
result = template
for key, value in variables.items():
placeholder = "{{" + key + "}}"
result = result.replace(placeholder, str(value))
return result
# Test the template engine
template = """
Dear {{name}},
Thank you for your purchase of {{product}} on {{date}}.
Your order number is {{order_id}}.
Please contact us at {{support_email}} if you have any questions.
Sincerely,
{{company_name}}
"""
variables = {
"name": "John Smith",
"product": "Python Programming Book",
"date": "May 15, 2023",
"order_id": "ORD-12345",
"support_email": "[email protected]",
"company_name": "Tech Books Inc."
}
rendered = render_template(template, variables)
print(rendered)
String Manipulation Best Practices
1. Use String Methods Instead of Manual Iteration
# Less efficient
uppercase_chars = ""
for char in text:
if char.isalpha():
uppercase_chars += char.upper()
# More efficient
uppercase_chars = "".join(char.upper() for char in text if char.isalpha())
2. Use join()
Instead of +
for Building Strings
# Less efficient (creates many intermediate strings)
result = ""
for item in items:
result += item + ", "
result = result[:-2] # Remove trailing comma and space
# More efficient
result = ", ".join(items)
3. Use f-strings for Readable Formatting
# Less readable
info = "Name: " + name + ", Age: " + str(age) + ", City: " + city
# More readable
info = f"Name: {name}, Age: {age}, City: {city}"
4. Use String Methods for Validation
# Less reliable
is_valid = True
for char in user_id:
if not (char.isalpha() or char.isdigit() or char == '_'):
is_valid = False
break
# More reliable
is_valid = all(char.isalpha() or char.isdigit() or char == '_' for char in user_id)
# Or even better
is_valid = user_id.isalnum() or "_" in user_id
5. Consider Regular Expressions for Complex Pattern Matching
import re
# Check if a string is a valid email address
def is_valid_email(email):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
# Test the function
emails = ["[email protected]", "invalid-email", "[email protected]"]
for email in emails:
print(f"{email}: {'Valid' if is_valid_email(email) else 'Invalid'}")
Exercises
Exercise 1: Write a function called reverse_words
that takes a string as input and returns a new string with the words reversed but the order of the words maintained. For example, “Hello World” should become “olleH dlroW”.
Exercise 2: Create a function that checks if a string is a palindrome (reads the same backward as forward), ignoring case, spaces, and punctuation. For example, “A man, a plan, a canal: Panama” is a palindrome.
Exercise 3: Write a function that extracts all email addresses from a given text. Use string methods (or regular expressions for an extra challenge) to identify and extract email patterns.
Exercise 4: Implement a function called word_censorship
that takes two parameters: a text string and a list of words to censor. Replace each occurrence of a censored word with asterisks of the same length. The censorship should be case-insensitive.
Hint for Exercise 1:
def reverse_words(text):
words = text.split()
reversed_words = [word[::-1] for word in words]
return ' '.join(reversed_words)
In the next section, we’ll explore lists in Python, which are versatile and widely used data structures for storing collections of items.