2

So I have a table with a lot of columns and rows and I need to filter it so columns with null values get droped.

In python I can do that this way(small example) :


df = pd.DataFrame({'col1': [1, None,3], 'col2': [3, 4,5],'col3': [3, 4,None]})

df.dropna(axis=1,how='any')

But I just have no clue how to do it in postgres.

I could not find an answer to that problem. I did find some answers but they use the column names which I just can't use because I have too many.

Thanks!

7
  • Does this help? stackoverflow.com/questions/31034847/… Commented Dec 23, 2019 at 17:18
  • Thank you @richyen but it did not help to my case. Commented Dec 23, 2019 at 17:31
  • It was worth a try :D -- could you explain why it didn't work for you? I guess you want a query to actually drop the columns? If that's the case, I don't think there's any DDL that will do it--you would have to script it Commented Dec 23, 2019 at 17:35
  • Umm it didnt work for the reason I got the table as I could do select * from my_table, gave me the same results @richyen Commented Dec 23, 2019 at 17:47
  • 1
    Ah I see, that link gives all ROWS with nulls--you want all COLUMNS with nulls. I believe Postgres doesn't have any facility like that AFAIK, unless you write a plpgsql function to do it Commented Dec 23, 2019 at 17:54

3 Answers 3

2

I can't claim it will break any speed records, but it will do what you ask. No dynamic SQL or user-defined functions necessary.

SELECT t.*
FROM your_table as t
-- If nulls are present, these will not be equal
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t))

If performance becomes a real concern such as having to run this query many times, you could create an expression index for it. However, I would recommend normalizing your database's data model if that's the case. You may just be papering over structural defects.

CREATE INDEX nulls_detected
  ON your_table (to_jsonb(your_table) = jsonb_strip_nulls(to_jsonb(your_table)));

Further optimizations could probably be found using a bloom filter for your index.


Here's an example of this in action:

CREATE TABLE null_example (
  id serial PRIMARY KEY,
  col1 int,
  col2 text,
  col3 boolean
);

INSERT INTO null_example (col1, col2, col3) VALUES
(1, 'test1', true),
(NULL, 'test2', false),
(3, NULL, true),
(4, 'test4', NULL),
(5, 'test5', false);

Now if you run the following…

SELECT t.*
FROM null_example AS t
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t));

…you get the following output. Any rows that contain NULL column values have been omitted.

id | col1 | col2  | col3
---+------+-------+------
1  | 1    | test1 | t
5  | 5    | test5 | f

If you are trying to target columns for removal such as from an ALTER TABLE … DROP COLUMN statement, the following query can help you along the way as well.

SELECT results.key, count(*), array_agg(t.id) AS affected_ids
FROM null_example AS t
  CROSS JOIN LATERAL jsonb_each(to_jsonb(t)) AS results(key, value)
WHERE results.value = 'null'::jsonb
GROUP BY results.key

This returns:

 key | count | affected_ids
-----+-------+--------------
col2 | 1     | {3}
col3 | 1     | {4}
col1 | 1     | {2}
Sign up to request clarification or add additional context in comments.

7 Comments

Looks interesting--can you demonstrate output? Tried it on my end, but it's not giving me the results that I would've expected
@richyen Added example
Ah, I see now that I misunderstood the question.
Well, let's check with OP--i added a comment on the question
I implemented a temporal table system (aka system-versioned tables) for my company in PostgreSQL. Because each table is different and because expecting folks to write custom temporal logic and functions for each table would be a maintenance nightmare, I figured out how to convert row data to and from JSON. This allowed for a more generic solution to the problem. A specific optimization I made was to convert OLD and NEW in a trigger to jsonb to detect if changes were made; the most efficient update is the one you don't have to make. It was superficially similar to this.
|
1

There isn't a facility to do this in Postgres, so you'll have to build-your-own function. One way to approach this is to retrieve all the column names with:

select attname
  from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'foo')
   and attnum > 0;

Then, loop through the attnames and perform:

select count(*)
  from foo
 where <attname> is null;

A possible function may look like;

postgres=# create table foo (col1 int, col2 int, col3 int);
CREATE TABLE
postgres=# insert into foo values (1, null, null);
INSERT 0 1
postgres=# insert into foo values (1, 1, null);   
INSERT 0 1
postgres=# insert into foo values (1, null, 1);
INSERT 0 1
postgres=# CREATE OR REPLACE function find_null_cols() RETURNS setof record
AS
$$
declare
rr record;
r record;
c int;
begin
for r in (select attname
  from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'foo')
   and attnum > 0)
loop
  execute format ('select count(*) from foo where %s is null', r.attname) into c;
  if c > 0 then
    select r.attname::text, c into rr;
    return next rr;
  end if;

end loop;
return;
end
$$
LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# select * from find_null_cols() as x(colname text, nullcount_rows int);
 colname | nullcount_rows 
---------+----------------
 col2    |              2
 col3    |              2
(2 rows)

I won't add the DROP COLUMN commands for your safety ;)

Comments

0

dynamic sql in postgresql

You can use 'execute statement'. Get the names of the columns with

SELECT *
   FROM information_schema.columns
  WHERE table_schema = 'your_schema'
    AND table_name = 'your_table'
      ;

and for each one of them, make a statement 'delete from' your view if the column has any null value

DECLARE
    column_name text;
BEGIN
    FOR column_name IN SELECT FROM information_schema.columns WHERE table_schema = 'your_schema' AND table_name = 'your_table' LOOP

        ...
        EXECUTE 'delete from ... where(SELECT ....' || column_name || '.....)';
        ...

    END LOOP;
END;

4 Comments

How does this indicate which column has null? This shows me table_catelog, table_schema and more columns, not familiar with that
You can read, for example, hub.packtpub.com/how-to-implement-dynamic-sql-in-postgresql-10 You have to know how to program functions in postgresql
Thanks, will read. So in order to get my desired outcome I must create a function?
Dynamic SQL is very powerful but also bypasses a lot of safety. Use with extreme caution, especially if it is driven by user-provided parameters.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.