Earlier this week my colleague clued me into a potential Postgres pitfall I’d yet to be bitten by, but it could happen.
Say you’re doing a once-off query like this. Yes, it’s a contrived, simplified example.
SELECT book, isbn FROM inventory WHERE isbn NOT IN ( SELECT isbn FROM orders )
This superficially works, but Postgres will not match
NULL values in this subquery when using
NOT IN. Ouch!
So the only way to be safe is to check the field is also
SELECT book, isbn FROM inventory WHERE isbn NOT IN ( SELECT isbn FROM orders WHERE isbn IS NOT NULL )
Or the more robust alternative, short of doing a proper join, is to strategically deploy
SELECT book, isbn FROM inventory WHERE NOT EXISTS ( SELECT isbn FROM orders WHERE inventory.isbn = orders.isbn )
I did some digging, and also found this interesting article on Explain Extended on the performance implicaitons of using these different queries.