0

I have a table of many columns in Postgresql database. Some of these columns are in text type and have several rows of values. The values also recur. I would like to change these text values with unique integer values.

This is my table column:

Country_Name
------------
USA
Japan
Mexico
USA
USA
Japan
England

and the new column I want is:

Country_Name
------------
1
2
3
1
1
2
4

Each country name is assigned (mapped) to a unique integer and all the recurrences of the text is replaced with this number. How can I do this?

Edit 1: I want to replace my column values on the fly if possible. I don't actually need another column to keep the names but it would be nice to see the actual values too. Is it possible to do:

  • Create a column country_id with the same values of country_name column in the same table
  • And for country_id replace each name with a unique integer with an update statement or procedure without requiring a new table or dictionary or map.

I don't know if this is possible but this will speed up things because I have a total of 220 columns and millions of rows. Thank you.

1
  • Create a mapping table using SELECT DISTINCT. Then update while joining to the mapping table. Commented Aug 17, 2018 at 13:38

1 Answer 1

3

assuming the country_name column is in a table called country_data

create a new table & populate with unique country_names

-- valid in pg10 onwards
-- for earlier versions use SERIAL instead in the PK definition
CREATE TABLE countries (
   country_id INT  GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
   country_name TEXT); 
INSERT INTO countries (country_name) 
SELECT DISTINCT country_name
FROM country_data;

alter the table country_data & add a column country_id

ALTER TABLE country_data ADD COLUMN country_id INT

Join country_data to countries and populate the country_id column

UPDATE country_data
SET country_id = s.country_id
FROM countries
WHERE country_data.country_name = countries.country_name

At this point the country_id is available to query, but a few following actions may be recommended depending on the use case:

  • set up country_data.country_id as a foreign key referring to countries.country_id
  • drop the column country_data.country_name as that's redundant through the relationship with countries
  • maybe create an index on country_data.country_id if you determine that it will speed up the queries you normally run on this table.
Sign up to request clarification or add additional context in comments.

1 Comment

i'd advise against generating this on the fly for performance & storage reasons. also depending on the data access method, two on-the-fly queries might map countries to different ids.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.