UTILYARD
guides

How to Sort Lines of Text

Alphabetical, numeric, and length-based sorting explained — how text sorting works, when to use it, and common pitfalls.

Why sort text?

Sorting a block of text — treating each line as a unit and rearranging those units into some defined order — is a surprisingly common task that shows up across many workflows. Developers sort import statements to make them easier to scan and diff. Data analysts sort lists to spot the highest or lowest values. Anyone maintaining a glossary, config file, or content list benefits from items being in a predictable order.

Beyond readability, sorting is often a prerequisite for other operations. Deduplication is much faster when identical lines are adjacent. Binary search requires sorted input. Many file-comparison tools produce cleaner diffs when both files are in a consistent order.

The UtilYard Text Sorter lets you paste any block of text and sort it alphabetically, numerically, or by line length — in ascending or descending order, with optional deduplication.

Alphabetical sorting

Alphabetical (lexicographic) sorting arranges strings by comparing them character by character from left to right, using each character's position in Unicode order. In standard ordering, uppercase letters come before lowercase (Z sorts before a), and digits come before letters (9 sorts before A). This often surprises people expecting a "natural" order.

Case-sensitive sort keeps the Unicode order as-is. A list containing both "Apple" and "apple" will not place them next to each other — "Apple" lands in the uppercase block and "apple" in the lowercase block. This is the correct behavior for systems where case carries semantic meaning.

Case-insensitive sort compares strings as if all letters were the same case. "Apple", "apple", and "APPLE" all land in the same position relative to "Banana", producing the intuitively expected result for word lists, name lists, and config keys.

Numeric vs. lexicographic sorting

This distinction catches many people off guard. Consider a list of version numbers or file sizes:

  • Lexicographic order: 1, 10, 100, 2, 20, 3, 9
  • Numeric order: 1, 2, 3, 9, 10, 20, 100

In lexicographic sorting, each character is compared individually — so "10" sorts before "2" because the character "1" comes before "2". This is correct for string data but almost always wrong when lines contain actual numbers you want ranked by magnitude.

A related concept is natural sort, which is the algorithm used by file explorers like Finder and Windows Explorer. Natural sort handles embedded numbers intelligently: "file2.txt" sorts before "file10.txt" because it compares the numeric portion (2 vs. 10) rather than the character sequence. This is ideal for file names, software versions, and any list that mixes text and numbers.

When choosing a sort mode, ask: are my lines strings to be compared alphabetically, or do they contain numbers that should be ranked by value? Picking the wrong mode produces a result that looks sorted but is actually incorrect.

Removing duplicates while sorting

Deduplication and sorting are natural companions. Once lines are sorted, identical lines are adjacent, making it trivial to remove duplicates in a single linear pass — each line is compared only to its immediate predecessor.

This is exactly what the Unix uniq command does, and why it is almost always combined with sort: sort list.txt | uniq. Running uniq on an unsorted file only removes consecutive duplicates, leaving non-adjacent ones in place.

Common situations where sort-and-deduplicate is useful:

  • Cleaning up a mailing list or user list that has accumulated duplicates over time.
  • Merging two lists of tags or keywords and removing entries that appear in both.
  • Deduplicating import statements at the top of a source file after a refactor.
  • Collapsing repeated log lines to get a unique set of events.

Common use cases

  • Sorting import statements — many style guides and linters require imports to be alphabetically ordered. Paste the import block, sort, paste back.
  • Organizing config keys — YAML, TOML, and .env files are easier to read and diff when keys are alphabetically ordered. Sorted keys also make merge conflicts less likely when multiple people edit the same file.
  • Sorting a column extracted from CSV data — copy a column of values into a text editor, sort it, then use the result to filter or rank other data.
  • Ranking by line length — length-based sorting is useful for organizing CSS utility classes or shell aliases from shortest to longest for readability.
  • Alphabetizing a glossary or index — manually maintaining alphabetical order as items are added is tedious and error-prone. Sort the whole list periodically to keep it in order.