Interview Question Series | Season 3

Repeated DNA Sequences – Reading Version


Problem in short:

You’re given a DNA string made up of 'A', 'C', 'G', and 'T'.
Your task is to find all 10-letter-long substrings that appear more than once in the string.

Example:
If the DNA is "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
then "AAAAACCCCC" and "CCCCCAAAAA" both appear twice.


Idea of the solution:

We need to check every 10-letter substring in the DNA and count how many times it appears.
If any substring appears more than once, we include it in our result.


Step-by-step solution (two main approaches):

1. Naive Approach – O(N²)

  • Extract every 10-letter substring.

  • For each substring, compare it with all other substrings to check if it repeats.

  • This is very slow, because each substring comparison costs time, leading to roughly operations.

✅ Works for small strings.
❌ Too slow for large DNA sequences (like length 100,000).


2. Optimized Approach using Hash Map – O(N)

We use a hash map (or map<string, int>) to count how many times each 10-letter substring appears.

How it works:

  1. Loop through the string, taking every substring of length 10.

  2. Store each substring in a hash map, increasing its count.

  3. After finishing, collect all substrings with a count > 1.

This reduces the time complexity to O(N) because:

  • Each substring operation is constant time (since it’s always length 10).

  • Insertion and lookup in a hash map are O(1) on average.
    So overall → O(N).


Quick Example:

Input:
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Process:

  • Take every 10-letter substring

  • Count occurrences

  • Substrings seen more than once → "AAAAACCCCC", "CCCCCAAAAA"

Output:
["AAAAACCCCC", "CCCCCAAAAA"]


C++ Code (Optimized Way – Using Hash Map)

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> result;
        map<string, int> count;
        int n = s.size();
        if (n < 10) return result;

        for (int i = 0; i <= n - 10; i++) {
            string part = s.substr(i, 10);
            count[part]++;
        }

        for (auto it : count) {
            if (it.second > 1)
                result.push_back(it.first);
        }

        return result;
    }
};

Notes:

  • Time Complexity: O(N)

  • Space Complexity: O(N) (for storing substrings)

  • Very popular interview problem (Google, Amazon, and Meta).

  • Tests your ability to use hashing to detect duplicates efficiently.

Scroll to Top