Quantcast
Viewing all articles
Browse latest Browse all 13

Find matches from an index file without exact matching and print the last field

I have a large file (~6mill rows) with 2 columns which looks like this:

1111.aaaaabbb.b.cccc.c  ValueA
2222.dddddeee.e.ffff.f  ValueB
3333.gggghhhh.h.iiii.i  ValueC

I want to use that as my index when searching this single column file:

aaaaabbb.b  
dddddeee.e  
gggghhhh.h  

And return:

ValueA
ValueB
ValueC
[...]
Valuen

As you can see I just care about the value after the first period, as long as there’s an exact match of the contents of the second file to the first file (not exact) I want it to return the value of column 2 from the first file. I don’t care about the prefix/suffix of the first file’s contents as long as the exact content of file 2 matches up.

Is there any way to do this with awk or any bash tool? Currently I’m trying to format the data properly in Excel (data to column tool) but it’s taking a long, long time as I have well over 6million rows so I have to manually do 6 files, and then compile the results together.

Edit on file1 contents: The prefix is always numerical, but varies in length from 4 to 7 digits. The content after the first period is alphanumeric and varies in length from 4 to 15 characters and can begin with numbers or letters, and the suffix is numbers/alphabets as well.


Viewing all articles
Browse latest Browse all 13

Trending Articles