Where Do Intelligence Platforms Find Cleartext Breach Data?

a909us3r

a909us3r

Member
Joined
April 7, 2025
Messages
16
Reaction score
2
Points
3
Question for the InfoSec Community

I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).
I'm curious — from where do such platforms gather this data?

Do they:

1. Monitor breach forums (like BreachForums)?
2. Pull from dark web marketplaces?
3. Scrape from paste sites (e.g., Pastebin)?
4. Use public dumps shared on GitHub, Telegram, or other leak sites?

Or something else entirely?

If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.

Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.
 
  • Like
Reactions: bokachan
AllosOnama

AllosOnama

Premium Member
Joined
April 23, 2025
Messages
14
Reaction score
2
Points
3
I think its pretty clear they do a bit of all of the above.
1749214800629


More interesting to me is what data architecture they use to store, tag and index what I image is a vast ocean of data with its provenance. Most leaks have some level of dirty data, missing columns and fields, duplicates, etc. as well as trash data if it was a full DB dump. Just the ETL process is a pain for these muti GB data sets.

I dont think they are much different than most of the more commercial data brokers, who gather in data from wherever they can, scraped, "permissioned", leaked or otherwise. Almost all of them operate in the grey IMO.
 
  • Like
Reactions: Fritz12
a909us3r

a909us3r

Member
Joined
April 7, 2025
Messages
16
Reaction score
2
Points
3
Thanks for the detailed information. Now I have no doubt. @AllosOnama 😀
 
hexadec

hexadec

Advanced Member
Joined
January 1, 2025
Messages
227
Reaction score
25
Points
28
Clearnet/darknet forums freebie or leaks sections, OSINT (using google dorks)
 
bokachan

bokachan

New Member
Joined
June 28, 2025
Messages
3
Reaction score
0
Points
1
Question for the InfoSec Community

I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).
I'm curious — from where do such platforms gather this data?

Do they:

1. Monitor breach forums (like BreachForums)?
2. Pull from dark web marketplaces?
3. Scrape from paste sites (e.g., Pastebin)?
4. Use public dumps shared on GitHub, Telegram, or other leak sites?

Or something else entirely?

If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.

Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.
ty
 
varocyber

varocyber

New Member
Joined
July 14, 2025
Messages
1
Reaction score
0
Points
1
  • Tags
    breach data data breach data leak intelligence
  • Top