PostgreSQL Replace Column Data with Unique Integers

Question

I have a table of many columns in Postgresql database. Some of these columns are in text type and have several rows of values. The values also recur. I would like to change these text values with unique integer values.

This is my table column:

Country_Name
------------
USA
Japan
Mexico
USA
USA
Japan
England

and the new column I want is:

Country_Name
------------
1
2
3
1
1
2
4

Each country name is assigned (mapped) to a unique integer and all the recurrences of the text is replaced with this number. How can I do this?

Edit 1: I want to replace my column values on the fly if possible. I don't actually need another column to keep the names but it would be nice to see the actual values too. Is it possible to do:

Create a column country_id with the same values of country_name column in the same table
And for country_id replace each name with a unique integer with an update statement or procedure without requiring a new table or dictionary or map.

I don't know if this is possible but this will speed up things because I have a total of 220 columns and millions of rows. Thank you.

Create a mapping table using SELECT DISTINCT. Then update while joining to the mapping table. — Laurenz Albe
– Laurenz Albe, Commented Aug 17, 2018 at 13:38

Haleemur Ali · Accepted Answer · 2018-08-17 13:41:51Z

3

assuming the country_name column is in a table called country_data

create a new table & populate with unique country_names

-- valid in pg10 onwards
-- for earlier versions use SERIAL instead in the PK definition
CREATE TABLE countries (
   country_id INT  GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
   country_name TEXT); 
INSERT INTO countries (country_name) 
SELECT DISTINCT country_name
FROM country_data;

alter the table country_data & add a column country_id

ALTER TABLE country_data ADD COLUMN country_id INT

Join country_data to countries and populate the country_id column

UPDATE country_data
SET country_id = s.country_id
FROM countries
WHERE country_data.country_name = countries.country_name

At this point the country_id is available to query, but a few following actions may be recommended depending on the use case:

set up country_data.country_id as a foreign key referring to countries.country_id
drop the column country_data.country_name as that's redundant through the relationship with countries
maybe create an index on country_data.country_id if you determine that it will speed up the queries you normally run on this table.

answered Aug 17, 2018 at 13:41

Haleemur Ali

28.6k6 gold badges67 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Haleemur Ali Over a year ago

i'd advise against generating this on the fly for performance & storage reasons. also depending on the data access method, two on-the-fly queries might map countries to different ids.

Collectives™ on Stack Overflow

PostgreSQL Replace Column Data with Unique Integers

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related