I have a large file (~6mill rows) with 2 columns which looks like this:
1111.aaaaabbb.b.cccc.c ValueA
2222.dddddeee.e.ffff.f ValueB
3333.gggghhhh.h.iiii.i ValueC
I want to use that as my index when searching this single column file:
aaaaabbb.b
dddddeee.e
gggghhhh.h
And return:
ValueA
ValueB
ValueC
[...]
Valuen
As you can see I just care about the value after the first period, as long as there’s an exact match of the contents of the second file to the first file (not exact) I want it to return the value of column 2 from the first file. I don’t care about the prefix/suffix of the first file’s contents as long as the exact content of file 2 matches up.
Is there any way to do this with awk
or any bash
tool? Currently I’m trying to format the data properly in Excel (data to column tool) but it’s taking a long, long time as I have well over 6million rows so I have to manually do 6 files, and then compile the results together.
Edit on file1
contents: The prefix is always numerical, but varies in length from 4 to 7 digits. The content after the first period is alphanumeric and varies in length from 4 to 15 characters and can begin with numbers or letters, and the suffix is numbers/alphabets as well.